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FOREWORD 


This ISSUE of the Review of Educational Research deals with the construc- 
tion and use of tests which evidence pupil growth in abilities. The issue 
includes studies of these abilities as measured by tests (Chapter I), and to 
some extent it touches upon habits of behavior, attitudes, and interests. 
These are coming to be recognized more and more in the objectives of in- 
struction, and are therefore closely related to the field of educational testing. 
In the main, however, tests of personality, character, attitudes, and intelli- 
gence are left to the Review dealing with “Psychological Tests and Their 
Uses,” the last issue of which was in June 1938. That issue, and its bibliog- 
raphy, should be considered as supplementary to the present number. 

The treatment in the present issue is concerned with the larger aspects of 
educational measurement. Testing in individual subject fields is not treated 
specifically; one should consult the recent numbers on “Special Methods 
and Psychology of the Elementary-School Subjects” (December 1937), 
“Psychology and Methods in the High School and College” (February 
1938), and “The Curriculum” (April 1937) for measurement as related to 
subject fields. 

One will note that many of the problems in the field of educational testing 
are problems of judgment. This should not be discouraging to research 
workers. Factual material must always fit into a matrix of human thought 
and value if it is to be used advantageously. The thing for research workers 
to feel legitimate concern about is whether they have sensed the real 
problems, and whether they are doing all that is within their power to 
attack these real problems, so that they can turn over to the practical edu- 
cator facts which are as free as possible from error, and which are germane 
to the problems that are basic. 

Douctas E. Scares, 
Chairman of the Editorial Board. 

















INTRODUCTION 


Tis Review of Educational Research attempts a slight change in organ- 
ization from previous reports on “Educational Tests and Their Uses.” 
This change is due partly to the fact that the contributions of the period 
stress somewhat different aspects of measurement, and partly to the fact 
that it appeared to be unnecessary to duplicate certain areas which recently 
have received entirely adequate treatment. 

Several hundred titles representing the test literature of the period from 
June 1935 to June 1938 were examined, abstracted, and fitted into the 
tentative outline of the report. Limited research contributions in certain 
quarters forced modifications of the outline originally planned. Further 
modification was necessitated by virtue of the very complete and detailed 
report of “Psychological Tests and Their Uses” in the June 1938 Review. 
The inclusion of material on vocational, aptitude, and personality tests, as 
well as an unusually critical and complete treatment of statistical methods 
related to test construction, made it advisable not to duplicate these treat- 
ments in this number of the Review. 

Notwithstanding that this report is basically a review of research it was 
felt that there are trends in the thinking of those most interested in the 
improvement of educational measurement which are not fully represented 
in completed research. Accordingly, and at the risk of some criticism, the 
Committee includes in this review certain trends and basic issues which 
have been expressed or observed during the period covered, but which might 
not be evident in the research itself. 

The chairman of the Committee wishes to thank the members and all 
others interested in this field who by their suggestions and criticisms have 
aided in the development of the outline and of the report itself. 


Harry A. GREENE, Chairman, 
Committee on Educational Tests and Their Uses. 


496 


3 
+ 
; 





oO ee © © © eH 4 SOOO elUmOlllCOMO le Oe Se UDO ee lll 


ior 





CHAPTER I 


Studies of Educational Achievement’ 


PAUL V. SANGREN 


Dependence upon Validity of Measurements 


A womser or writers rightly question the value of experimentation in 
which wholly or partially inadequate tests are used as measures. Orata (63) 
pointed out that “we cannot be progressive in our teaching and remain 
basically traditional in our testing.” J. S. Gray (40), in a similar vein, 
said: “The adequacy of objective tests for evaluating hypotheses in educa- 
tion is obviously much inferior to that in the more basic sciences. It is more 
difficult to control the variables. However, a part of research training for 
this step is to learn the better methods of testing and their limitations. 
Educational research is in dire need of more objective and more adequate 
methods of testing our great mass of unproven theories.” Payne (67) made 
similar remarks in regard to evaluation in higher education. In an indict- 
ment of meaningless and superficial research he said: “I have not made 
these remarks about experiments just to be critical. . . . I have made them 
primarily because we have no very accurate ways of measuring the results 
of the experiments. . . . If, then, we wish to advance education from the 
point of view of science, the greatest need of research in the field of higher 
education is in the field of measurement.” Newland (61) blamed both 
college educators and research workers for the dearth of research at the 
college level. The former, according to Newland, cannot agree as to the 
function of higher education, and the latter hold too narrow a conception 
of measurement. Peik (69), although speaking particularly with reference 
to experimentation in curriculum research, stated an opinion which is also 
applicable to educational research in general. 

The value of innovation in experimental groups was determined by initial and final 
measurements of status. This method of evaluation through tests depends upon the 
reliability and validity of the instruments of measurement. Whereas testing is scientific 
in theory, and experimentation is a basic approach to curricular development, this type 
of research is still limited in value through a lack of instruments that are wholly valid 
and sufficiently refined. In some important sectors tests of outcomes are yet either lack- 


ing or in a very crude state of development. Suitable tests are essential to the progress 
of research through measurement... . 


Evaluation of Achievement Tests 


A comparison of the New Stanford Achievement Test and the Modern 
School Achievement Tests was made by Woolf and Lind (97). Results 
showed that the Stanford Achievement Test indicated a slight superiority 


* Bibliography for this chapter begins on page 554. 
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in statistical evaluation and tended to give slightly higher age and grade 
levels according to means obtained in a comparison of individual tests. 
The writers concluded with the statement that the two tests are so similar 
in content and in their correlation with other tests that the use of either is 
justified. A study by Pullias (72) reported variability in results from objec. 
tive achievement tests. Thirty-five teachers constructed 63 examinations in 
geography, history, and health. These were paired and administered to 68 
groups. Three standardized objective tests in each of seven subjects and 
two in each of three subjects were also administered to 460 sixth-graders. 
Conclusions to be drawn from this study are, according to the author, that 
(a) a test may be objective in the sense that personal opinion is eliminated 
in scoring and still fail to remove important personal elements from the 
evaluation of pupils’ achievement; (b) measures of pupil achievement ob- 
tained from different informal objective tests may be expected to vary to a 
considerable extent; and (c) pupil ratings based upon standardized tests 
show marked disparity. 

H. A. Gray (39) studied tests recorded and reproduced by sound. Three 
hundred and seventy pupils in Grades II through VIII were subjects. Tests 
were adapted from the arithmetic reasoning, language usage, and spelling 
sections of the Modern School Achievement Tests, and arranged in both 
sound record and mimeographed form. After analyzing the scores made 
on the two types of test, Gray concluded that “the pupils in question were 
able to detect errors in spoken language more readily than they could in 
the printed form.” The results were not conclusive, however, and more 
experimentation in this field is needed. 


Studies of General Achievement 


In a study of variations in educational accomplishment in a number of 
school subjects among children of normal intelligence, Stout (81) reported 
that although the intelligence quotients of children may be normal it is 
clear that they are not as a rule homogeneous in other traits. For this study 
a wide range of rating scales and aptitude, achievement, and mental tests of 
both the individual and the group types were used. 

Schrepel and Laslett (78) tested 121 junior high-school pupils in 
Grades VIII and IX with the New Stanford Tests in the spring and again 
in the fall to check the loss of knowledge during the summer vacation. 
The results showed no serious losses with the possible exception of arith- 
metic computation. The authors concluded with the suggestion that stren- 
uous reviews in the autumn are psychologically and pedagogically ques- 
tionable. 

Experiments comparing the educational achievement of various groups 
are reported by several authors. For example, Breidenstine (6) reported 
a study concerning the educational achievement of pupils in differentiated 
and undifferentiated groups. The New Stanford Achievement Tests were 
used to measure educational achievement. A summary indicated that both 
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groups were about equal in mean composite score, and that both were 
almost equal in E.Q. The undifferentiated group was slightly superior 
at every level, however. Moore (57) reported a study of the educational 
achievement of delinquent and dependent boys. The Otis Self-Administer- 
ing Test, Intermediate and Higher Examinations, and the Modern School 
Achievement, Short Form (spelling omitted), were administered. The 
author stated, “When the subjects in this study are compared with local 
norms of retardation, normalcy, and acceleration for chronological age, 
both groups are considerably below the Tennessee school children.” In 
fact, the dependents showed twice as high a retardation, and the delin- 
quents three times as high. Scholastic difficulties of the children of immi- 
grants have been studied by Overn and Stubbins (64). Native American 
children made up the control group and children from foreign language 
speaking homes made up the experimental group. The Pintner Cunning- 
ham Primary Mental Test and the Metropolitan Achievement Tests were 
used. The authors concluded that “special procedures should be found to 
aid the entrants to the first grade from foreign language speaking homes 
to overcome language handicaps as rapidly as possible.” 

Smeltzer and Adams (79) reported a study of the educability of tran- 
sients. Forty-three transient boys, who were admitted to the Industrial 
School at Lancaster, Pennsylvania, to receive an extensive six months’ 
course in vocational education, were administered the Terman Group Test 
of Mental Ability, Form A, the New Stanford Achievement Examination, 
Advanced, Form V, the Stenquist Mechanical Aptitude Tests I and II, and 
the Thurstone Personality Schedule. Final tests, administered to the twenty- 
five remaining subjects, consisted of other forms or repetitions of the same 
form when only one was available. Results showed that “contrary to many 
subjective statements regarding the characteristics of transient adolescents 
who have taken to the ‘road’ during the past few years, many of these 
boys appear to be typical members of a population with unfortunate 
situations in home environment. . . . These transient boys represent a 
group of late adolescents who are definitely educable.” 

Traxler (87) and Ross (77) reported experiments involving the rela- 
tionship of achievement to various other factors. The former was concerned 
with the correlation of achievement scores and school marks. The Cooper- 
ative Test Service Examinations were used for 20 school subjects and 
reading was measured by the Nelson-Denny Test. Marks assigned on a 
percent basis during the year 1934-35 in eight independent secondary 
schools for boys were also used. Traxler stated, “The general conclusion 
to be drawn from this article is that, although intelligence and reading 
skill, as measured by the American Council Psychological and the Nelson- 
Denny tests, operate to raise the correlation between scores on achievement 
tests and school marks, a positive and significant degree of relation usually 
exists aside from the influence of either of these facts.” Ross (77) at- 


tempted to determine the relationship between intelligence, scholastic. 
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achievement, and musical talent. Subjects were 1,541 pupils in Grades V 
through XII in the California public schools. Intelligence was measured 
by the Terman Group Test of Mental Ability, achievement by the Stanford 
Achievement Test, Form V, and musical talent by the Seashore Tests of 
Musical Talent. Results showed a low degree of relationship between intel- 
ligence and musical talent. Relationship between scholastic achievement 
and musical talent is so low that a measure in one cannot be used to 
predict a score in another. 


Prediction of School Success 


Factors useful in predicting educational success have been the subject 
of various studies. Dunlap (24) reported a study of preferences as indi- 
cators of specific academic achievement. He concluded, “It would seem 
that if the preliminary form of the preference blank described were refined 
and extended, the expressed preferences of an individual could be used 
to increase materially the accuracy of the prediction of future academic 
success at the junior high-school level.” Ficken (30) studied the subject 
of predicting achievement in the liberal arts college. He found that the 
Minnesota College Aptitude Test correlates higher with grade point aver- 
ages at the end of one semester and at the end of one year in the case of 
women than in that of men. Unsegregated College Aptitude Test scores and 
grade point averages show low correlations. However, high-school rank 
and grade point averages gave a correlation of .67. Ficken stated, “One 
cannot avoid the conclusion that the College Aptitude Test is of doubtful 
value for general prediction purposes at this institution. . . . It does not 
follow that the test is not a good one for other purposes, such as, for ex- 
ample, the motivation and guidance after matriculation of those students 
who are not working up to capacity.” 

Variations among high-school seniors in promise and performance meas- 
ures were studied by Eckert and Mills (27). Four hundred and forty-two 
high-school seniors in the college entrance curriculum in the Buffalo high 
schools were the subjects. Achievement was measured by the average on 
the Regents’ examinations; promise was measured by a lengthened form 
of the American Council Psychological Examination. Other tests were 
administered as follows: the history and English parts of the Iowa High- 
School Content, the Strong’s Vocational Interest Blank for boys, the Manson 
Occupational Interest Blank for Women, the Neumann Test of Interna- 
tional Attitudes, Zyve’s Stanford Scientific Aptitude Test (modified), and 
Willoughby’s Revision of the Thurstone Personality Schedule. Results 
led to the following conclusions: (a) Teachers’ marks are as good a 
criterion for differentiation as the Regents’ examination. (b) Performance 
in mathematics and science is more indicative of scholastic application 
than marks in other fields. (c) Measures of studiousness show some of 
the clearest differences. 

Peck (68) was concerned with the relationship of drawing performance 
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and school achievement of young children, and with special factors influ- 
encing that relationship. A non-verbal group test (Prediction Test) based 
solely on drawings of beginning children was administered to 1,000 chil- 
dren. Correlations with Gates Reading Tests, a flash-card word test, and 
teachers’ estimates of school achievement led to the conclusion that “chil- 
dren’s drawings, obtained in the manner chosen for the study, probably 
furnish as accurate prediction of school success as do other predictive 
measures in common use.” 


Achievement in Reading 


Reading readiness—Hilliard and Troxell (48) reported a study of in- 
formational background as a factor in reading readiness. Tests used were 
the Sangren Information Tests, Smith Vocabulary Test, and the Healy 
Picture Completion Test. Reading readiness was measured by the Lee-Clark 
Reading Readiness Test and the Stone-Grover Classification Test. The Gates 
Primary Tests were used to measure reading status at intervals. The rich 
background group was decidedly ahead of the meager background group 
in sentence and paragraph reading, and only slightly superior in word 
recognition. Wilson and Burke (95) studied reading readiness and later 
achievement. The Van Wagenan, Metropolitan, and Stone-Grover Tests 
were administered to first-grade pupils. These scores were correlated with 
fourteen reading tests, teachers’ November ranking of predicted achieve- 
ment, and teachers’ May ranking of actual achievement. Teachers’ Novem- 
ber prediction gave the highest correlation with reading ability as later 
measured by tests. A continuation of this study is also reported, with 
similar conclusions (94). 

Grant (38) compared the predictive value of the Metropolitan Readi- 
ness Tests with the Pintner-Cunningham Primary Mental Test. Two hun- 
dred and sixty first-graders were subjects. Two years after the administra- 
tion of these two measures, the Gates Primary Reading Test was admin- 
istered. To those making perfect scores on this measure, the Metropolitan 
Achievement Test in Reading was administered and in some cases the 
DeVault Primary Reading Test. Results indicate that the Metropolitan 
Readiness Test measures factors significantly related to later success in 
reading skills and that it is on a par with the Pintner-Cunningham in 
providing a basis for prediction. 

Dolch and Bloomster (18) reported an experiment involving phonic 
readiness. General maturity was measured by the Pintner-Cunningham 
Primary Mental Test and the Detroit First-Grade Intelligence Test. Tests 
1 and 2 of the Basic Reading Tests, Word Attack Series, were used to 
measure the use of phonics. Results showed that children below seven 
years of age made only chance scores in phonics. These results, according 
to the authors, may indicate a minimum age for teaching phonics. 

Studies of primary grade reading—Hill (47) studied the process of word 
discrimination in individuals beginning to read. Pupils were re-examined _ 
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after twelve or fourteen weeks of training, with achievement tests which 
consisted of the actual reading material presented to the subjects during 
the training period. Training tended to direct attention to the beginning 
of words, and even though individual patterns were somewhat affected, 
the group pattern of difficulties showed little change over the training 
period and under the teaching methods employed. Donnelly (20) studied 
the growth of word recognition in Grade I. Three hundred and eighty-nine 
first-graders were given a test of 150 words chosen at random from the 
first two levels of the Gates Vocabulary List. Of thirty pupils in the bot- 
tom tenth at the end of the third month, twenty were in the same position 
at the end of the year. Girls were superior to boys. The author concluded 
that there is need for more inventory testing to determine mastery of basic 
vocabulary. 

McDade (53), in an experiment with first-graders, eliminated oral read- 
ing entirely. Results from the Gates and Metropolitan Primary Reading 
Tests at the end of the year showed gratifying results. Comparisons with 
classes taught by other methods were favorable to the non-oral reading 
group. Individualizing instruction in reading as tried by Worlton (98) 
in the Salt Lake City schools showed significant gains as measured by the 
Gates Reading Tests, Types A and C. Games as a means of teaching arith- 
metic and reading were considered by Goforth (36). Read-O and Add-O 
were the games employed with second-grade pupils. Marked gains were 
reported together with increased interest. Wheeler (93) also reported a 
study involving the Read-O games. Experimental and control groups were 
set up in the first grade on the basis of the Dearborn Intelligence Tests. 
A final measure of reading was obtained through the use of the DeVault 
Standardized Reading Tests. Results showed a high correlation between 
the word recognition of Read-O and general achievement. The relation- 
ship of pictures to reading was studied by Miller (55). Initial and final 
tests on three stories were constructed; no significant differences for the 
use of pictures were found. 

Reading in the higher grades—Traxler (88) reported the results of an 
experiment in teaching corrective reading to eight seventh-grade pupils 
for nine weeks. According to results of tests, seven “derived considerable 
lasting benefit from the instruction.” Garrison (33) was also interested 
in a remedial reading program with pupils in the ninth grade. Sangren- 
Woody Tests were used as initial and final measures. Out of 33 pupils, of 
whom all but two were reading at the sixth-grade level, all but four 
showed improvement. 

Seeking new methods and objectives in teaching dull-normal pupils 
of the upper grades to read, Walcott (92) employed the work-sheet method 
to facilitate the study of subjectmatter and to cultivate skills constituting 
reading ability. A retest with the Iowa Silent Reading Test seemed to 
substantiate the use of this method. Another experiment employing three 
instructional units yielded encouraging gains for low average pupils. 
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Gates and Bond (34) studied five “dull-normal” and one remedial reading 
group in the Speyer Experimental School. The Modern School Achieve- 
ment Test was initially administered in February and finally in June. No 
formal classes were held in subjectmatter areas, but special attention was 
given to the developing of competence in reading. Reading gains in all 
classes were large, ranging from six to thirteen months. Additional meas- 
ures were applied in the form of the Gates Primary and Advanced Reading 
Tests. The average 1.Q. represented by the classes was 82. 

Cramer (13) reported a study involving the administration of reading 
and arithmetic tests, constructed and administered in Australia, to 1,000 
California and Washington pupils. Results indicated a tendency for Aus- 
tralian children to score better in arithmetic and for American children 
to score better in reading except in Grades VII and above. 

Vocabulary studies were reported by Anderson and Fairbanks (1) and 
Tilley (84). The former were concerned with differential factors in read- 
ing and hearing vocabularies of university freshmen. The Inglis Tests of 
English Vocabulary were used. Conclusions are that “vocabulary ability 
is a centrally determined function, operating, on the average, independent 
of the mode of presentation of material.” The study by Tilley reported a 
technic for determining the relative difficulty of word meanings among 
elementary-school children. A multiple-response test was constructed from 
the Survey Tests of Vocabulary, the Lower Extension of the Inglis Vocab- 
ulary, and eighty words of the vocabulary section of the Stanford Achieve- 
ment Examination. A self-appraisal technic was also employed. This tech- 
nic apparently had a rather high validity which varied as did the intelli- 
gence of the children. 

Bilingual children—Reading and arithmetic abilities of Spanish and 
English speaking children have been studied by both Manuel (54) and 
Kelley (49). The former is concerned with both tool subjects. The average 
Spanish speaking child suffers a serious and persistent language handicap 
at least as high as the eighth grade. Kelley found results similar to those 
of Manuel; the Spanish speaking children were below the norm in all 
grades, with deficiencies unconfined to any particular phase of reading 
ability. 

Reading and various factors—Tinker (85) studied the reliability and 
validity of eye movement measures of reading. Seventy-seven university 
sophomores and fifty-seven freshmen were subjects. Where group compari- 
sons are concerned, eye movement measures for as few as five or six lines 
have adequate reliability; in individual diagnosis, however, twenty or 
more lines are necessary for adequate reliability. Tinker and Paterson 
(86) studied typographical factors influencing speed of reading on the 
Chapman-Cook Speed of Reading Test. Cloister black retarded reading 
about 16 percent. Traxler (90) found no significant differences between 
the sexes in rate of reading in high school. Anderson and Tinker (2) 
studied speed in reading performance of 110 college sophomores tested - 
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individually on the first five parts of the Iowa Silent Reading Test. They 
found that “the data justify the conclusion that when an adequate method 
of measurement is employed, there is an intimate relation between rate 
of reading and comprehension scores for the type of material here con- 
sidered.” 

Pace (65) studied laterality and reading ability in high school and 
college. Laterality was determined by means of a questionnaire. The Minne- 
sota Reading Examination, Form A, and the Minnesota Speed of Reading 
Test, Form A, were administered. Students who were shifted, ambidextrous, 
or left-handed showed no significant inferiority on the Minnesota Reading 
Examination, but some inferiority was apparent for this group on the 
Minnesota Speed of Reading Test. 

The relation of ability in reading to success in other subjects was investi- 
gated by Finck (31) by means of the controlled group technic. For twenty- 
two pairs from Grades IV through VIII it was found “that improvement 
in ability to read is accompanied by improved achievement in those sub- 
jects which involve a great deal of reading.” McCullough (52) conducted 
a similar study with ninth-graders. 

Swanson and Tiffin (83) were interested in the relationship of perform- 
ance on the Betts Telebinocular and the Iowa Silent Reading Test of 267 
freshman men. Conclusions reached seem to indicate that it is “improbable 
that differences in visual efficiency are causally related to differences in 
reading ability among college students. . . . This statement is equally 
true whether intelligence is left uncontrolled or whether it is held constant 
by means of partial correlation.” Witty and Kopel (96) have used the 
Betts tests of visual sensation and perception and of oculomotor and per- 
ceptual habits in an effort to determine the relationship of poor reading 
to reversals, fusion difficulties, muscle imbalances, and mixed eye-hand 
dominance. This-latter factor was measured by a modification of the Koch 
handedness questionnaire, the manoptoscope, dynamometer, and other 
devices. Subjects were 100 public school children, with I.Q.’s of 80 or 
above, whose reading scores were lowest among 2,000 pupils in Grades 
III to VI, inclusive, on several standardized tests. The Metropolitan and 
Gates Tests indicated that the control group was one and one-half grades 
better in reading. Conclusions reached were that “the etiology of reading 
disability (as an entity) lies in no single visual (or other noumenal) 
factor.” ; 


Achievement in Arithmetic, Algebra, and Geometry 


Porter (71) reported a three-part experiment to determine the effect 
on achievement in geometry and algebra of spending one class period a 
week on mathematical recreation. The Otis Self-Administering Test, the 
Lane-Greene Unit Achievement Tests in Plane Geometry, the Hotz First 
Year Algebra Scales, Hart’s Geometry Tests, the Columbia Research Bu- 
reau Plane Geometry Tests, a county eighth-grade examination, and the 


504 

















oh Phe an EP PE BO ry 








December 1938 Stupies oF EpuCATIONAL ACHIEVEMENT 





Silance-Remmers Scale for Measuring Attitude toward Any School Sub- 
ject were administered. Results showed that the experimental groups were 
most interested, and the conclusion reached by the author was that the 
use of mathematical recreation in class work is advantageous for achieve- 
ment. 

Morrison (59) conducted an experiment to evaluate the mass method 
versus the individual method in teaching multiplication to fourth-grade 
pupils. Sixty-two pupils were taught according to the mass method and 
75 pupils were taught according to the individual method. In October the 
Wilson Process Test in Multiplication, 5P, was used as an initial test. 
It was repeated as a final measure in March and as a retest in the following 
September. Forty-seven periods of 45 minutes each were used. The indi- 
vidual group used the Wilson Drill Book in Multiplication. Results showed 
that the individual method group experienced a slightly greater gain and 
also made a greater permanent gain. Betts (3) reported the results of a 
study involving the use of a calculating machine for arithmetic instruction. 
Thirteen pupils in Grade VIA were subjects. Tests used were the Compass 
Survey Test and the Compass Diagnostic Test (XVIII, Problem Analysis). 
The author reported: “Limited test data secured so far probably are of 
value only to the degree that they permit a forecast of factors to be con- 
trolled in a more extensive investigation. . . . The typical instructional 
material for the sixth grade does not provide enough problem drill to 
justify extensive use of calculating machines. Further investigation will 
probably be conducted on a higher level.” 

Harap and Barrett (44) undertook to discover whether fundamentals 
could be learned in an arithmetic activity program for Grade III in which 
integers are studied. Ten activity units were covered in the year. The Los 
Angeles Diagnostic Test in Fundamentals of Arithmetic gave an average 
grade score of 4.1. Harap and Barrett said, “These results confirm our 
earlier findings that the fundamentals can very satisfactorily be learned 
in a program of arithmetic units based on real situations in which arith- 
metic is learned as it is used, not before it is used.” The direct and indirect 
methods of teaching the addition combinations were studied by Breed and 
Ralston (5). The controlled experiment procedure was used in Grades I 
and II. Tests used were the Otis Group Intelligence Scale, the Buswell- 
John Addition Test, and an Initial Combinations Test (non-standardized). 
In Grade I the Courtis Standard Research Test, Series A, was also admin- 
istered. Results show that the indirect method is better in complex addition 
and as good or better in addition combinations. 

Brownell and Watson (7) conducted an experiment on the comparative 
worth of personal interviews and the analysis of tests as diagnostic methods 
in arithmetic. The test used was a modified form of the Brueckner Diag- 
nostic Test in the Addition of Fractions. Results showed that the personal 
interview and the analysis of tests were about equally effective in identi- 
fying the total number of faults for the entire group. When the diagnosis. 
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is that of the difficulties of individual children, the personal interview js 
both more reliable and more valid. Feder (29) attempted to determine the 
effect of directions, and arrangement of items, on student performance 
on the number series test of the Mathematics Aptitude Test, Form X, of 
the Iowa Placement Examinations. Three forms were constructed. One was 
the original, another retained the original order with improved directions, 
and a third was a power test with original directions. Results led to the 
following conclusions: (a) clear-cut directions are superior to longer, 
more detailed explanations; (b) when arranging items in order of difi- 
culty, care should be taken that items are not grouped according to 
common basic principles which are not found in succeeding groups. 

Grossnickle (41) reported a study involving concepts in social arith- 
metic for the eighth-grade level. A list of 68 mathematical concepts in the 
business usages of arithmetic found in a majority of 13 different textbook 
series for the seventh and eighth grades were incorporated in a four- 
response multiple-choice test. The tests were administered to 1,337 pupils 
completing the eighth grade in 8-4 plan schools. The level of attainment 
in mastery of concepts in most schools was about the same; in only one 
school was the average attainment more than two-thirds the total possible 
score. 

Gundlach (42) conducted an experiment with a twofold purpose: (a) 
to give information concerning the nature of the curve of growth in ability 
to work types of examples in common fractions for pupils in Grades V1 
to XIII, inclusive; and (b) to determine to what extent the factor of 
mental capacity affects the curve of growth in ability of pupils of the 
secondary school to work types of examples in the operations of common 
fractions. Gorman (37) administered a 215-item test of arithmetic vocab- 
ulary to 92 teachers in Grades I through VI, and to thirty students enrolled 
in a class in the teaching of elementary arithmetic. He found a significant 
difference between the arithmetic vocabularies of experienced teachers 
and students in elementary education, and that a significant proportion 
of both teachers and students manifest a lack of understanding of a number 
of important signs and abbreviations in arithmetic, and both need to 
improve in the ability to define in simple language many of the basic 
concepts. 


Achievement in English 


Traxler and Anderson (89) gave two forms of an essay test in English 
to high-school pupils in a carefully controlled situation. The papers were 
scored by two individuals. Results showed that the reliability of the reading 
of the papers was high, but that the reliability of pupil performance was 
relatively low. 

Wagner and Strabel (91) attempted to determine what measures, avail- 
able at college entrance, best predict subsequent performance in English. 
Grades in high school and on New York Regents’ examination, scores on 
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the American Council Psychological Test, the Iowa High-School Content 
Examinations, the Cooperative English Test, and the Nelson-Denny Read- 
ing Test were studied. General conclusions seem to be that “college Eng- 
lish performance may be predicted about equally well by a measure of 
secondary school English, a secondary school language, general high school 
performance, or the Cooperative English Test. Vocabulary seems especially 
important for success of boys; general information for girls.” 

Lowen (51) reported an experiment in which she attempted to leave 
environment as the one variable affecting the output of poetry by two 
groups of children. Environment was judged by Sim’s Score-Card, home 
visits, and the principal’s survey. Lowen said, “In this experiment environ- 
ment made no appreciable difference in the quality of poetry produced.” 

Netzer (60) was concerned with an attempt to evaluate pictures, incom- 
plete stories, and objects as stimuli for oral language. Responses of 
fourth-, fifth-, and sixth-graders were recorded electrically through micro- 
phonic equipment. Subjects responded best to objects, next to stories, and 
third to pictures. An attempt to rate the compositions on the Thorndike 
Extension of the Hillegas Scale proved unsatisfactory with the result that 
oral composition scales were developed. 

Garnett (32) reported a study of the status and improvement of college 
freshmen in certain skills in English composition. Six tests were admin- 
istered in three teachers colleges. The author concluded that “only a small 
number of students are adequately prepared for the high art of teaching 
the basic skills in written English.” 


Achievement in Foreign Language 


Wrightstone (99) reported that the scores of 125 pupils taught according 
to newer type practices were superior in Latin to those of pupils who had 
been taught according to standard practices. The superiority, however, was 
not statistically significant. De Sauze (15) reported the results of the 
American Council French Test and parts of the Cheydleur French Test 
administered to students in the Cleveland schools where the teaching of 
foreign language involves a multiple or eclectic approach. Results of the 
reading versus the eclectic method showed the latter to be superior in 
Cleveland schools for the second and fourth semesters in total achievement 
and also when the silent reading scores alone are considered for fourth- 
semester pupils. 

Stalnaker and Kurath (80) constructed two twenty-minute tests in Ger- 
man vocabulary which are claimed to be highly reliable. One was a best- 
answer recognition test and the other a context recall test. Results of 
administration to 184 students in elementary German showed the context 
test to be slightly more reliable and to be preferred by a slightly greater 
number of the subjects as a fairer test. Both tests appear to measure very 
nearly the same abilities. 
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Measurement of Spelling Ability 


Northby (62) compared five types of spelling tests for diagnostic pur- 
poses. Twenty words were selected from the Iowa Spelling Scales and 
administered to forty-three sixth-graders in story form, timed dictation, 
list form, multiple choice, and orally. Timed dictation and story form were 
most difficult, and multiple choice the easiest; the list form appears best 
for diagnostic purposes. Stuit and Jurgensen (82) conducted a similar ex- 
periment with freshman students at Carleton College. The Cooperative 
English Test calling for the identification of 53 misspelled words in eight 
themes, and a dictation test presenting these same 53 words were admin- 
istered. Students had a tendency to score higher on the dictation test. The 
authors concluded, “All factors considered, it seems that a test requiring 
a student to write dictated sentences would be more valid as a test of 
spelling ability.” 


Growth in Handwriting 


Conard (11) studied the influence of manuscript writing and type- 
writing on the development of 150 children in the second, third, and 
fourth grades. Results of other subjects were noted. The manuscript tests 
were scored according to Conard Manuscript Writing Standards and one 
point was deducted for each typing error. “As a result of the study . . . 
it appears that the typewriter is influential in developing the children’s 
creative writing, does not affect handwriting detrimentally, but appears 
to stimulate both quality and speed in handwriting, and has a minor 
influence on other subjectmatter.” Goetsch (35) was concerned with the 
effect of the shift from manuscript to cursive writing upon the pupils’ 
writing and composition in the intermediate grades. Cursive writing was 
taught in all grades of control cities; manuscript in Grades I and II, and 
cursive in Grade III, of experimental cities. Specimens were analyzed 
according to the Kansas City Scale for Measuring Handwriting and the 
Nassau County Supplement of the Hillegas Scale for Measuring the Quality 
of English Composition. The data showed no evidence that either type of 
early training leads to a better quality of composition in the later grades. 


Achievement in Science 


Buckingham and Lee (9), using true-false tests and organization tests 
in natural science, showed that college freshmen may use memory alone to 
secure high scores on a true-false test and still be unable to see the relation- 
ship between items and a central thesis. They suggested that it is necessary 
to go a step beyond the objective testing of correctness alone to meaningful 
organization. In a similar study, Downing (23) devised a test to measure 
skill in the use of some of the elements of scientific thinking and the safe- 
guards that are needed. Fifteen questions were administered to over 1,000 
pupils in Grades VIII through XIII in science classes only. Conclusions are 
“that ability to think scientifically is a complex of a number of component 
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abilities and that these develop at varying rates and differently in different 
communities.” 

Peterson and Douglass (70) compared the results of using published 
workbooks with pupil-made notebooks in general science. Pupils were 
paired on the basis of the Otis Group Intelligence Test, chronological age, 
and scores on an objective test given at the beginning of the year. Progress 
was measured by assignment tests over units of the course, an objective 
test at the end of the first semester and at the end of the third semester, 
and the New York Regents’ examination in general science. During a 
second year the Ruch-Popenoe General Science Test was administered as 
an initial and final test. Although there were only two significant differ- 
ences in favor of the notebook method, six other differences favored this 
method. Ruch-Popenoe results, although not yielding a fully reliable 
difference, favored the workbook section. 

Gutzeit (43) reported the results of teaching an abstract concept in 
science by means of the motion picture. The controlled group technic was 
used. Ten-minute tests over elementary molecular and atomic theory were 
constructed. Results showed subjectmatter to be within the conception 
range of the eighth grade. The tests were too simple and brief to gauge 
the effectiveness of teaching, however. 

Rosenlof and Wise (75) compared the relative achievement of pupils 
in courses in physical science and in physics and chemistry on the basis 
of three different factors. Pupils were paired on the basis of the Otis Self- 
Administering Test of Mental Ability, Noll’s What Do You Think Test, 
and a comprehensive physical science test devised for this study. This 
latter test was also used as a final test together with the Cooperative 
Physics Test, and the Cooperative Chemistry Test. The authors felt that 
as a result of this study a fusion of physics and chemistry is possible 
and desirable. 

Dickter (17) studied the relationship between scores on the scholastic 
aptitude test and college marks in chemistry at the University of Penn- 
sylvania. The author concluded that results on the mathematics section 
showed enough promise to warrant its continued use in the scholastic 
aptitude tests. 

Burnett (10) reported the results of an experiment in the problem 
approach versus the recitation method in the teaching of biology. The 
Hoff Scientific Attitude Test was administered at the end of six weeks, the 
Ruch-Cossman Biology Test at the end of twelve weeks, and comprehensive 
formal objective tests at the end of each unit. Results favor the problem 
approach throughout. Another study of teaching methods was made by 
Douglass and Fields (21) who compared the merits of the daily assign- 
ment-daily recitation and the unit assignment methods of teaching high- 
school chemistry. Results of the Powers General Chemistry Test and an 
experimental test of high reliability led to the conclusion that neither 
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method was distinctly superior and that factors other than method were 
responsible for differences in final scores. 


Achievement in Social Studies 


Reilley (73) conducted an experiment in an effort to determine the 
interest of high-school seniors in politics. Tests were constructed from 
material of current newspapers and magazines. The tests were adminis- 
tered to seniors and also to freshmen in a school where the development 
of political interests had been one of the objectives of the social studies 
course. Results show that an interest in political matters may be developed 
if the school authorities make definite provision for it. 

Herrick (46) proposed certain instruments for the evaluation of pupils’ 
thinking concerning current social questions. Although at the time of his 
article the tests had not actually been tried in the classroom, the author 
believes that if a high correlation is obtained between the pupils’ ability 
to judge the soundness of the arguments of others, and their own position 
in using sound arguments, the proposed instruments may be a valid means 
of measuring the quality of one aspect of a pupil’s thinking. 

The value of the Stone Reading Test, the Otis Group Intelligence Scale, 
Advanced, and the Wesley Social Terms Test as instruments for predicting 
achievement in United States history was investigated by Bolton (4). The 
Wesley Test was found to be significantly better for purposes of prediction 
than either of the other two. 

Congdon (12) studied papers written for entrance examinations to 26 
colleges, in an effort to ascertain the differences in achievement in geogra- 
phy, civics and history, and general science, between students from dif- 
ferent sections of the country, and from rural and urban populations. 
Results show that geography and general science are affected both by 
locality and population; differences for civics and history are not statis- 
tically significant; locality exerts the greater influence on geography, and 
rural-urban status upon science. 

Douglass and Pederson (22) reported an experimental study to deter- 
mine the value of large units versus daily assignments in eight sections 
of American history in high-school classes. Initial tests were devised by 
the experimenter. Twelve weeks of instruction were followed by an ob- 
jective test devised by the experimenter, and the Iowa Every Pupil Test in 
American History. Results seem to point to the superiority of the large 
unit plan. 

Park and Stephenson (66) studied the value of visual aids in teaching 
language arts and social studies. Two groups of Grade VIIB pupils, fifteen 
in each group, were administered a fifty-item objective-type test over the 
unit to be taught. Progress during the experiment was checked at regular 
intervals by job tests. Results of final tests show that visual aids are 
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worthwhile and that a close correlation of language arts and social studies 
and literature make for a better understanding and appreciation of the 
material. 


Music Ability and Achievement 


Ross (76) found 428 Indian children in Grades VI through XII to be 
inferior to white children on the Seashore Music Talent Test. Japanese 
children, however, compared favorably at all grade levels with the white 
children. 

Dean (16) used the Seashore Musical Talent Tests and the Terman 
Group Test of Mental Ability to determine their value in predicting the 
success of entering students in the Eastern Montana State Normal School 
in required courses in sight-singing and ear-training. Results show that 
intelligence is not as important as prior musical training and that the 
Seashore tests of pitch and memory were most predictive. Farnsworth 
(28), after attempting to determine the relative values of music capacity 
tests and intelligence tests in the prediction of music grades for college 
students, concluded that “music capacity and intelligence tests have 
variable potencies in the prediction of music grades.” He calculated 
correlations between the Thurstone Psychological, lowa Placement, Sea- 
shore Sense of Pitch, and Seashore Tonal Memory tests, and grades in 
music theory, and history and appreciation of music. Lamp and Keys 
(50) conducted a somewhat similar experiment to determine whether or 
not aptitude for specific musical instruments can be predicted. The Terman 
Group Intelligence Test and the Seashore Tonal Memory and Pitch Dis- 
crimination Tests were administered and certain physical measurements 
were taken of 15] ninth-grade pupils. Conclusions are that the Seashore 
Tests do not yield an index of aptitude for brass, woodwind, or stringed 
instruments adequate for individual guidance. The predictive value of the 
Terman test is even less. Teeth evenness and length or slenderness of fingers 
bear no appreciable relationship to performance. A combination of meas- 
ures, however, proved to be of sufficient predictive value for the brass horn 
to be of use for guidance purposes. 

Rigg (74) was interested in the relationship of discrimination in music 
to discrimination in poetry. Seventy-one college men were given the 
Oregon Music Discrimination Test, the Rigg Poetry Test, and the Ameri- 
can Council Psychological Examination. Intercorrelations were low. 


Teaching Conditions and Achievement 


Eastburn (25, 26) reported studies of the relation of class size and the 
efficiency of instruction. English and history were the subjects considered 
in an early investigation. Pupils were paired on the basis of the Terman 
Group Test of Mental Ability, the Columbia Research Bureau American 
History Test, and the Columbia Research Bureau English Test. Grade 


points, age, and sex were also considered. Initial tests given to groups 
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were the Columbia Research Bureau American History Test, the Iowa 
General Information Test in American History, the Columbia Research 
Bureau English Test, the Iowa Placement Test in English Training, and 
the literature section of the Iowa High-School Content Examination. Dif- 
ferent forms of these examinations were used as final measures. The Hand- 
Carley Student Reaction Form was also administered. Results of this study 
and of a later investigation employing similar tests led to the conclusion 
that “since some teachers can handle large classes as effectively as small 
classes, it becomes the responsibility of the school administrator to deter- 
mine which of these teachers can teach large classes effectively and in 
what subjects and on what ability levels.” 

Crawford and Carmichael (14) reported the results of a study involving 
three years with home study and three without in Grades V to VIII. Results 
on the Stanford Achievement Test revealed no significant differences, al- 
though pupils without home study showed a significant drop in high-school 
marks. 

Herr (45) conducted an experiment in which 97 junior high-school 
pupils were allowed to reduce the three-year course to two. Evaluation 
was made by administering certain standardized tests to three senior high- 
school classes and comparing the experimental with the control groups. 
The Columbia Research Bureau Tests in Plane Geometry were adminis- 
tered to the sophomores; the Columbia Research Bureau Tests in Chem- 
istry and American History to the juniors; and the Purdue Diagnostic 
English Test and Peters’ Test of General Information were used with the 
seniors. Results show that “so far as scholastic achievement in senior high 
school is concerned, no outstanding differences in the achievement of the 
two groups are found. . . . It must be noted, however, that the measures 
for which the differences are significant, are in favor of the rapid progress 
group.” Other factors studied were extra-scholastic activities and social 
adjustment, which showed no significant differences. 

Morgan (58) was concerned with evaluating the seminar method in a 
course in elementary educational psychology for superior students. Two 
control groups and an experimental group were used. Matching was done 
on the basis of the Thurstone Psychological Test and results of a final 
examination in elementary psychology. The final examination in educa- 
tional psychology showed a tendency for the experimental group to be 
superior. ' 

Wrightstone (100) provided comparative data between newer- and 
standard-type public schools, at the elementary, upper elementary, and 
secondary levels. Results were reported by the author which indicated 
that “newer type practices will produce equal if not superior achievement 
in desirable skills, knowledge, attitudes, personal and social adjustments. 
and character traits.” 





CHAPTER II 


Educational Prevention, Diagnosis, and 
Remediation’ 


FRED P. FRUTCHEY 


The MOST IMPORTANT use of evidence concerning the mental, social, 
emotional, and physical behavior of boys and girls is to aid in developing 
an understanding of them. Teaching may be based upon valid evidence, 
carefully collected and wisely interpreted, or it may rest upon a series of 
untested assumptions, poor guesses, and wishful thinking—or some degree 
between the two. Teaching which is carefully related to the future as well 
as the present development of boys and girls must have a careful factual 
basis, including not only cross section data but also long-time records. 
As Lefever (123) said, “Without a definite basis for determining the 
nature and extent of the child’s growth, educational planning will be 
reduced to sheer guesswork.” 

There has been some emphasis in the literature upon prevention, rather 
than upon remediation. For the purposes of the present chapter, however, 
we shall not be concerned with the distinction, since we are here giving 
attention primarily to the fact-finding and appraisal aspects which underlie 
prevention, diagnosis, and remediation alike. 

The general desirability of using tests at the beginning of the school 
year in order to understand pupils and to be able to provide appropriate 
educational experiences was described by Chase (107). Tyler (133) and 
Lee (122) dealt at greater length with the close relationship of evaluation 
to the curriculum. Newland and Ackley (125) commented that guidance 
should rest not only upon the diagnosis of failure and subsequent recon- 
structive measures but also upon constructive measures in the prevention 
of failure. 

The most significant treatise on educational diagnosis which has ap- 
peared in the last three years is the Thirty-Fourth Yearbook of the National 
Society for the Study of Education (124). The yearbook contains chapters 
dealing with factors associated with learning difficulty. One section is 
devoted to the principles and technics of educational diagnosis and treat- 
ment. Two chapters by Tyler (132, 134) discuss ideas fundamental to 
any diagnosis or inventory. Two sections are devoted to diagnosis in read- 
ing, English, arithmetic, social studies, natural science, health education, 
behavior disorders of children, speech, vocational interests, abilities and 
aptitude, musical talent, art, leisure-time activities, and creativeness. The 
final chapter by Stenquist (128) takes up the administration of a program 
of diagnosis and remedial instruction. 


1 Bibliography for this chapter begins on page 558. 
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Clarification of Objectives 


A more exact diagnosis in education rests upon a greater clarification 
of the important intangible objectives of education by identifying and 
describing types of behavior characteristics of each of these objectives. 
The study by Rankin (126) on creativeness is especially valuable because 
it illustrates how an intangible objective can be made concrete and mean- 
ingful through the specifying of areas of activity and characteristics of 
behavic. Hartung (120) described aspects of the ability to interpret data. 
A clarification of social sensitivity, an important objective of the social 
studies, was presented by Taba (130). A bulletin by Fawcett (112) illus. 
trated some characteristics of behavior involved in an understanding of 
the nature of proof—a mathematics objective. Dale (108) discussed diag- 
nosis in leisure-time activities. In diagnosing reading skills and abilities 
in the elementary school, Wrightstone (143) presented a chart showing a 
list of reading deficiencies, their probable symptoms and remedial treat- 
ments. Witty and Kopel (137) reported an analysis of the difficulties of 
poor readers. Thompson (131) analyzed information test results into areas 
of content. These studies illustrate the clarification and analysis of general 
objectives into subobjectives and characteristics of behavior which are to 
be treated in a remedial procedure. 


Devices for Gathering Evidence 


An examination of the literature indicated that many types of devices 
were used in diagnosis. Various writers (134, 144) pointed out the need 
for different devices. Davis (109) used checklists of behavior character- 
istics relating to reading and other types of behavior. Strang (129) used 
a student’s own analysis of how he reads a problem as a means of obtain- 
ing some insight into his mental behavior. Various instruments to test 
vision, hearing, phonetic aptitude, and other kinds of physiological and 
psychological tests were employed. 

Significant contributions in the uses and interpretation of test results 
as a basis for remedial instruction are being made under the direction of 
R. W. Tyler by the Evaluation Staff of the Progressive Education Associa- 
tion’s Commission on the Relation Between School and College. Although 
reports of this research have not appeared in the literature up to the closing 
of the bibliography, they should be mentioned here because they deal with 
the analysis of various characteristics of behavior which represent a general 
objective. Interpretations of test results are made in terms of kinds of 
behavior as well as in terms of areas of subjectmatter. 


Diagnosis in Reading 

Many of the reported studies contributing evidence of results of remedial 
and corrective instruction in the school are in reading. These studies reflect 
concern about the relation of pupil interests, rapport, pupil recognition 
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of attainable goals, extensive reading, and teacher points of view to reme- 
dial reading instruction. Witty and Kopel (141) used pupil interest in 
selecting reading materials and planning a reading program. They stated, 
“From indifferent, fearful, and unhappy youngsters most have changed 
into interested, alert participants in numerous school activities.” They be- 
lieved that the changes brought about were not only the results of methods 
used but also of the point of view of the teachers, “. . . a sympathetic 
understanding of children and a sane attempt to meet their legitimate 
needs.” They had objective records of increased amount of voluntary leisure 
reading and a more intelligent and frequent use of books and library 
facilities. Witty and Kopel (140) further reported that “at the beginning 
of the work about one-third of the poor readers approached any reading 
task with trepidation. The change in attitude when attainable goals were 
sensed brought about a personality alteration not readily describable in 
objective terms.” 

In using an extensive reading program in remedial teaching, Ansley 
(101) found that an experimental group gained in comprehension the 
equivalent of one year more than a control group. Anecdotal records of 
the pupils’ interests in reading showed desirable shifts. Garrison (113) 
reported a variety of testing methods used in a remedial approach based 
upon pupils’ interests and extensive reading. Gains in rate and compre- 
hension and greater interest in reading were found. Brooks (105) con- 
cluded that the point of view of the teacher in creating and encouraging 
a desire to read is an important part of remedial work. 

The data from intensive studies of remedial reading in New York City 
led Gates (116) to believe “that at least four out of five deficiencies in 
reading result from failure to recognize the individual pupil’s failure and 
difficulties which crop out from day to day.” A teacher must be alert and 
sensitive to behavior symptoms during the busy day of teaching. He 
claimed that investigations have been too much concerned with reading 
as an isolated activity. 

Burk’s study (106) of factors in the style of composition showed that 
fourth-graders are most interested in stories containing short simple 
sentences, and least interested in stories written in long complex and 
compound sentences. Long and complex sentences, however, have no effect 
on comprehension, and produce the highest rate of reading. Fourth-graders 
prefer stories containing direct conversation. Highest comprehension and 
rate of reading were obtained with stories using direct conversation. The 
writer pointed out that these are “rounded off” generalizations, not sharply 
supported by the data in the study. 

In the diagnosis of reading difficulties it is important to identify and 
eliminate physiological disabilities in planning a psychological approach. 
Witty (139) pointed out: “In every case of reading disability search 
should be made for visual difficulties. Such examination is a vital item in 
the comprehensive individual diagnosis which should precede remedial 
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endeavor. It is clear that the cause of reading disability (as an entity) 
lies in no single visual factor.” 


Effect of Diagnosis and Remediation upon Teachers 


A remedial program may produce desirable changes in the attitudes 
and points of view of teachers. Deady (110) stated that in the minds of 
teachers their diagnostic and remedial teaching program has transformed 
“grades” into “boys and girls.” Eurich (111) stated that “the improve- 
ment of examinations stimulates instructors to become critically aware 
of the specific objectives and outcomes of their instruction and leads to 
changes designed to improve both selection of subjectmatter and methods 
of teaching. It compels the instructor to think of numerous illustrations 
of the way in which his instruction changes student behavior.” A funda- 
mental principle of diagnosis and remedial instruction relating to the 
point of view of the teacher is that conclusions drawn from research results 
become promising hypotheses when applied to an individual boy or girl 
(132). A competent diagnostician thinks of them as a tentative working 
basis subject to modification as new facts come to light. 
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CHAPTER III 
The Essay-Type Test’ 


CHARLES C. WEIDEMANN and BIRDEAN J. MORRIS 
Limitations of the Essay Test 


Lack or RELIABILITY of scores was pointed out by Holroyd (148) as the 
first defect of the essay test. The reasons given for the unreliability of 
scores were the lack of objectivity and the influence of such factors as 
English construction, spelling, penmanship, neatness, arrangement of 
form, sympathy for the hard working but slow student, general improve- 
ment, and personal attributes on the grade. Other criticisms were: (a) 
restricted usefulness with almost no opportunity for diagnosis; (b) en- 
couragement of cramming; (c) little basis for comparison between stu- 
dents or classes; (d) encouragement of bluffing; (e) consumption of an 
overshare of students’ and instructor’s time; (f) lack of any known for- 
mula for correction of guessing, as in objective examination; and (g) 
the restricted range of material that can be tested in a given time. 

Criticisms of the essay test as used by many English teachers were listed 
by Stalnaker (157). The first objection is that teachers try to teach the 
pupil to write charming bits of nonsense on subjects of no interest to him 
instead of eiding him to express himself clearly and accurately within the 
range of his interests and abilities. Another weakness is the vagueness in 
the instructions given the students. The essay test is rarely read with a 
reliability of over .60, when it should be read with at least a reliability of 
.90. Explanations offered for this inconsistency in the rating of essay tests 
or themes were: (a) disagreement among masters in English on what 
constitutes a good theme, (b) the influence of the reader’s physical condi- 
tion on his grades, (c) the objection to grading a theme high, and (d) 
the traditional use of optional topics. Kandel (150) offered as his objec- 
tions to the essay examination the unreliability of scoring and the time 
involved in the construction and marking of the tests. Wrightstone (160) 
objected to the essay tests on the basis of (a) time-consumption, (b) narrow 
range of information tested, and (c) unreliability and subjectivity varia- 
tions in grading. 

One of the real limitations of the essay test in actual practice may be 
that it is not measuring what it is assumed to measure. Doty (146) ana- 
lyzed the essay test items and answers for 214 different items prepared 
by teachers in fifth and sixth grades and found that only twelve of these 
items, less than 6 percent, “unquestionably measured something more 
than recall.” Doty set up a number of criteria for determining whether 
the answers involved a significant amount of reorganization of knowledge, 


1 Bibliography for this chapter begins on page 559. 
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or whether they involved only direct memory. Some of his conclusions 
were: “That a test item is essay in form gives no assurance that it is essay in 
fact. . . . The essay-type test used in the classroom measures memory 
more often than it measures any other mental process. . . . Teachers are 
not measuring all of the objectives of instruction which it is desirable and 
possible to measure” (146:30-31). 


Values of the Essay Test 


The essay test has been named as the only valid test of ability in written 
composition and as the best measure of artistry of effort in expression of 
thought (148). Other merits attributed to it were: (a) ease of construction 
and administration, (b) shortness of time used in construction compared 
to time taken for objective test, and (c) questions can be written on the 
blackboard, saving both labor and materials as compared with tests 
requiring mimeographing. The essay test has values for advanced students 
because: (a) it reveals reasoning procedures, originality, and initiative; 
(b) it tests ability to organize; (c) it offers opportunity to exercise dis- 
crimination and judgment; and (d) it allows for interpretation of thought. 

Makers of objective items frequently argue that such items call for 
thinking and problem solving just as definitely as do essay tests. Jones 
(149) attacked this claim, pointing out that a large number of the objective 
items examined revealed very few predominantly thought questions. Most 
of the items were concerned with definitions, memorized formulas, dates, 
and proper names. Jones added, “When we are ready to give as much 
time to good essay examining as we now are giving to objective forms of 
examining in many centers, we will doubtless strengthen our college 
education considerably.” 

Wrightstone (160) said that essay questions may be defended for meas- 
uring certain objectives. The objectives named were: (a) an attitude 
toward some social, political, or economical phenomena; (b) the organi- 
zation of social studies facts; (c) the interpretation, evaluation, or dis- 
cussion of social studies facts and data; and (d) an application of social 
studies principles to described events or situations. 

One type of defense for the essay test is to attack the reliability of the 
objective test. Pullias (153) after a thorough study which is of far-reaching 
significance concluded that: (a) tests may be objective in the sense that 
all personal opinion is eliminated in scoring and still fail to remove 
important personal elements from the evaluation of pupil achievement. 
(b) measures of pupil achievement obtained from different informal 
objective tests may be expected to vary to a considerable extent, and (c) 
pupil ratings based upon standardized test scores show marked disparity. 
Pullias based his conclusions on the analysis of 6,200 teacher-made ob- 
jective test papers given to 3,100 pupils, and upon 1,380 standardized 
test batteries given to 460 pupils, in the fifth and sixth grades of public 
school systems. 
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Comparison of Mental Functions Measured 


Meyer (151, 152) directed one group of students to prepare for a true- 
false test, a second group to prepare for a multiple-choice test, a third 
group for a completion test, and a fourth group for an essay test. He then 
gave each of these groups all four types of test. He found all four types 
of test equally good for the measurement of recalled facts, but found that 
the completion and essay tests gave higher scores for a later testing of 
the same students. Seeking evidence on the methods of study employed 
by the four different groups of students, Meyer obtained statements from 
them, and also examined their markings in the books they had studied. 
He concluded that students studying for the objective tests tend more to 
the (a) underlining of words, phrases, and sentences; (b) listing of 
names, places, dates, and numbers; and (c) framing of practice test 
questions. Those studying for essay tests tended toward the (a) making 
of summaries in paragraph form, (b) drawing of maps, and (c) taking 
of random notes. 

The type of material learned varied in accordance with the method of 
study. Students who were preparing for objective tests tended to (a) learn 
facts, (b) memorize statements, (c) put emphasis on details, and (d) 
learn definitions, words, and figures. Pupils studying for the essay test 
attempted to (a) get a general view of the material, (b) form personal 
opinions, (c) interpret material, and (d) fix the general outline and 
then add the details. When studying for an objective test, a student said, 
“I stuff my memory with as many facts as I think it likely to retain for 
the required time, until and including the test, and then quickly forget 
everything except the few points that appealed to me as most important.” 

Jones (149) stated his belief that instructional emphasis and the 
student’s own efforts are almost certain to follow in the trail of the prin- 
cipal methods of examination used in a school. If emphasis were laid on 
factual details, the student would naturally turn to underscoring correct 
items or to listing points on memory cards. If students were graded on 
the quality and substance of their essays, they would try to improve in 
this respect, and examiners would take more interest in aiding develop- 
ment along this line. Jones contended that the essay examination still holds 
the attention of the average professor in the field of social science or the 
humanities. “He is more interested in a whole examining picture, a 
Gestalt, than in separated examining objectives. He is willing to have 
many facts omitted, but he wants the student’s own organized expression, 
for without expression education is meaningless and the whole mechanism 
of instruction becomes pedantry” (149:202). 

Jones (149) reported that, in answer to the statement, “I think one’s 
ability is far better shown through discussion questions than through short 
objective questions,” 68 percent of the students in colleges which give 
senior comprehensives, and 55 percent of the superior students in other 


colleges, answered positively. Alumni taking both types of examinations: ~ 
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offered even more favorable comments on the essay test. The alumni 
favored the essay because they felt that it was more important to be able to 
discuss an issue than merely to check it. From a survey made of the exami- 
nation system of Harvard College, Hanford (147) stated undergraduates 
favor the “reasoning, speculative type of examination questions.” 

Hanford (147) held that questions challenging opinion, presenting a 
problem, or calling for critical comment, lend themselves to the essay 
method of discussion. In speaking of the General Examination System of 
Harvard College, he stated that the examiners prefer the essay test which 
involves analysis and explanation, opinion, or evaluation of propositions. 
The essay test has been found by them to be suitable for (a) measuring 
the student’s ability to use and correlate knowledge, (b) discovering how 
far the student has grasped the meaning of the material studied, and (c) 
discovering the use the student can make of the material. 

Kandel (150), offering an argument in favor of the essay type of 
examination, said that the essay test provides better evidence of the 
understanding, reasoning, and ability to organize information than do 
objective tests. An experiment was performed to find the results of the 
essay and objective test given in the same subject, United States history, 
to the same pupils. According to these data the essay test proved to be 
slightly superior to the objective test as an instrument for measuring 
understanding. Stewart (158) concluded that well-chosen essay-type ques- 
tions give a teacher a knowledge of the pupil and an understanding of 
his thought processes that cannot be obtained by any other means. 

Raths (154) listed the major objectives of the thirty schools in the 
eight-year project of the Progressive Education Association. These objectives 
were classified under eight heads: (a) thinking; (b) interests, aims, and 
purposes; (c) attitudes; (d) study skills and work habits ; (e) social adjust- 
ment; (f) creativeness; (g) functional information, including vocabulary: 
and (h) a functional social philosophy. Of these eight, four called for the 
use of the essay test, namely, the first, fourth, seventh, and eighth. 


Suggestions for Improving the Essay Test 


Holroyd (148) suggested that the technics of the essay examination be 
refined and used in composition exercises and for tests of reasoning, 
judgment, organization, and appreciation in the case of advanced students. 
It was further suggested that the essay test be used with other tests, per- 
mitting (a) a wider sampling, (b) greater objectivity in scoring, and (c) 
a comparison of standards. For solving the problem faced jointly by the 
English teacher and test technician, Stalnaker (157) suggested con- 
structing a test which would test the permanent writing skill of the student. 
The first step was to make clear what abilities were to be measured. One 
exercise should not attempt to measure all the abilities; each ability 
should be measured separately. Short exercises are advised so that there 
will be examples enough for a dependable measure. Questions should not 
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be formulated to inspire the students since the object is not to test his 
inspired creative genius but his consistent underlying ability to express 
himself. Stalnaker described tests which purport to measure the student’s 
knowledge of grammar and general mechanics and at the same time 
afford a rough measure of the student’s writing ability. The first-mentioned 
test is the construction-shift test in which a sentence is given the student 
and he is told to make in it a specified shift in structure. Though not 
strictly an objective test it is read with a reliability of .98. Two other 
types are mentioned which can also be read with a high degree of con- 
sistency. In the first of these, verbose sentences are to be reduced to as 
few words as possible without sacrificing essential ideas. The other test 
consists of short, choppy sentences to be rewritten into a smoothly running 
paragraph without changing organization or ideas. 

Wrightstone (160) said that the essential step in improvement was the 
omission of questions that test mainly recall of information. He further 
suggested that the examiner determine the objective or objectives to be 
measured and devise appropriate questions for each objective. For the 
scoring of such questions as those asking the pupil to describe, compare, 
contrast, explain, interpret, discuss, develop, evaluate, and summarize, 
he suggested the use of scaled samples. 


Scoring the Essay Test 


Meyer (151) tried grading answers to essay tests in three ways: (a) 
giving points for correct facts called for, (b) giving additional points for 
correct facts supplementary to those required, (c) giving additional points 
for organization. When grading on facts, one point was allowed for each 
correctly stated fact that was pertinent; organization was rated on a ten- 
point basis. Intercorrelations between total scores ranged from .83 to .91. 

Stalnaker (155) believed that the essence of measurement is objectivity. 
He referred to evidence that the readers of the College Entrance Exami- 
nation Board tests can grade papers with reliabilities of over .90 and in 
some instances over .98. For tests of composition ability, he would elimi- 
nate the use of letter grades with absolute literary significance. The exer- 
cises should be evaluated with points for definite elements which com- 
petent teachers judge to be significant. He felt that such methods would 
produce reliabilities as high as .95 without a sacrifice in validity. Stal- 
naker (156) believed also that the use of optional questions is a cause 
of unreliability in grading essay tests. He said, “Asking all students to 
run the same race is a feasible step in improving the essay examination.” 

Possible improvements in the marking of essay tests were offered by 
Wrightstone (160). The examination should be planned to measure one 
defined objective of instruction, such as an attitude or interpretation of 
facts, for which no valid and reliable objective test is available. A defi- 
nition of the objective should be accepted by all the readers of the 


examination, and certain standards of measuring values should be agreed 
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upon by the readers. If the teacher wishes to include several objectives, 
such as pupil’s organization of facts, social attitudes, and neatness, the 
paper should be graded for one purpose at a time and the grades assigned 
separately. The teacher or teachers grading the papers should first decide 
upon the aspect to be marked, then an ideal answer should be formulated, 
and each part assigned a certain number of points. Wrightstone recom- 
mended an eleven-point scale for grading, from 0 through 10. 

Regarding the examinations, largely essay, used at Harvard College, 
Hanford (147) stated, “Through the preparation and grading of the 
examination by the same persons, by constant and intimate consultation 
regarding line cases, and by means of the oral examination as a supple- 
ment to the written examination for line cases, an attempt is made to obtain 
reliability and validity of the examinations.” According to Thurstone 
(159), the essay test should be restricted to one or two pages. He suggested 
that grading can be improved if done with reference to a predetermined 
list of ideas which shall be regarded as acceptable in the replies and 
perhaps still other ideas that shall be regarded as not acceptable. Kandel 
(150) said that it is possible to mark essay examinations with a reliability 
of .80 or over; they are usually marked with reliabilities from .30 to .50, 
and .70 or over is rare. Methods should be used to restrict answers to 
specific questions, and Kandel offered the same suggestions on procedure 
as those already adopted by the Regents’ or College Entrance Examination 
Boards. These suggestions are: (a) agreement on what questions should 
be marked for, (b) analysis of an ideal answer, and (c) assignment of 
a certain number of points to each significant part of a question. 


Summary 

More use of improved forms of the essay test along with objective tests 
is recommended in order to attempt a more comprehensive measurement 
of differing mental functions. Much research is needed to indicate how 
the essay test may be improved. Almost no literature touches the problem 
of validity of the essay test. No studies of the “Form A-Form B” reliability 
of the essay test along lines similar to “Form A-Form B” consistency 
determinations of objective tests are available. The problem of scoring 
essay and objective tests is not that of developing a high consistency cor- 
relation between two sets of essay scores, or a high correlation between 
essay and objective test scores over the same material, thus reducing the 
essay to a form measuring approximately the same mental functions as 
the objective test measures. The problem involves a low correlation be- 
tween essay and objective test scores over the same material with high 
consistency coefficients. Under these conditions each test type would 
measure mental functions unique to its type and thus decrease the over- 
lapping of the mental functions measured. It seems probable that there 
is a definite place, need, and use for improved forms of essay tests in 
the secondary and college levels of learning. 
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CHAPTER IV 


The Improvement of Classroom Testing’ 


DOUGLAS E. SCATES 


Tie rprovemenr of classroom testing seems to lie in the following three 
directions: (a) a more carefully considered set of purposes for which test- 
ing is done; (b) a greatly broadened and enriched set of outcomes which 
must be measured; and (c) the development of instruments appropriate to 
the newly created needs. The discussion of these three movements practi- 
cally fills the testing literature, crowding out much of the former concern 
over such questions as old-type vs. new-type tests, true-false vs. multiple- 
choice tests, or instructions to guess vs. instructions not to guess, which 
until recently held the center of interest. The present concern is with more 
fundamental issues; the novelty of objective and standardized testing has 
passed, the surface attractiveness of carefully printed instruments has worn 
off, and those workers who are now leading the thinking are searching the 
testing movement to ascertain what fundamental values are there, and what 
changes can be made to make tests a stimulus rather than a hindrance to 
further educational progress. 

The literature dealing with such topics is primarily of the analytical, 
argumentative type, based on experience and observation. The problems in 
such areas do not lend themselves readily to formal research. The present 
summary will accordingly be made up chiefly of discussion material, with 
supporting research where available. The present treatment does not deal, 
except incidentally, with tests of intelligence, personality traits, and the 
like, as these were covered in the Review of Educational Research for June 
1938. The term “testing” is used in the broad sense which it is gradually 
coming to have, including, by implication, all systematic means of gather- 
ing evidence on pupil performance. 


Changes in Teaching and in Testing 


Each generation seems to discover for itself teleological and methodo- 
logical concepts which it brands as new, or progressive, even though these 
very ideas may have been formulated and voiced centuries or millenniums 
earlier. It is difficult to know what is new; most ideas are new only to indi- 
viduals. It appears however that there are strong movements in education 
today which are actually affecting practice in conventional schools in a way 
which heretofore was only talked about, or practiced only in a few private 
schools. While it would be unreasonable to claim that vitalizing concepts of 
education are new, we may perhaps say that the verbal expression of these 
concepts has reached a new level of definiteness and specificity, in contrast 
to the earlier rhetorical generality. 

2 Bibliography for this chapter begins on page 560. 
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Contrasting the present situation with an earlier one, Wrightstone (235) 
stated: 


When these [objective] tests were first introduced into the school program, the 
curriculum consisted very largely of the three R’s—reading, ‘riting, and ‘rithmetic. 
At this early period in the development of both curriculum and tests, the emphasis 
was upon a mastery of subject matter. Teachers, parents, and pupils believed that 
classroom education should be mainly the memorization and recitation of assigned 
facts and information. The curriculum and its needs, therefore, influenced the kinds 
of tests which were constructed and used. It is not surprising that practically all of 
the earlier and older tests measured recognition and recall of isolated facts and infor- 
mation in separate subjects. 

Progressive education has arrived at a stage of development, both in curricular 
practices and in testing, where new objectives and practices in education have created 
a need for new tests. Compared with the early curriculum and its emphasis upon infor- 
mation and facts, newer practices in progressive schools have created new objectives 
of instruction. The project technique and the integrated activity unit-of-work have 
opened up entirely new kinds of classroom behavior and objectives. 


Other writers have pointed to the same trend. Changes in testing become 
necessary because of changes in teaching. Lee (193:465) commented: 
“Standardized tests which measured merely a knowledge of facts in lit- 
erature, history, or geography were constructed for an elementary school 
whose central purpose was the mastery of a given body of subjectmatter. 
There is no more reason for using such tests today than there is for using 
books which were written for that type of school.” Cook (175:471) stated 
that while “many of the earlier standardized tests . . . have tended to en- 
courage and perpetuate this type of teaching and learning, our most com- 
petent test builders are striving with a large degree of success to construct 
tests which will discourage this memoriter type of teaching. The best of the 
newer tests avoid the use of stereotyped textbook language, require the ap- 
plication of laws and principles in new situations, emphasize understanding 
and the relationships of ideas rather than mere verbal learning, and stress 
the functional value of what has been learned, rather than the subjectmatter 
itself. Such tests are freed from the specific content of textbooks or courses 
of study, and measure rather the intellectual development which should be 
achieved through that content.” 

Hopkins (188), dealing with the matter in terms of fundamental, more 
or less philosophical, terms, pointed out that the measurement movement 
up to the present has been based on assumptions which grew out of “con- 
ceptions of éducation crystallized in America about 1900,” the most impor- 
tant of the assumptions being: 

1. A social heritage organized in subjects in which uniform learning of minimum 
essentials is required of all. 

2. A control by the teacher of what is to be learned and how it is to be learned. 

3. A fragment of subjectmatter can be adequately measured in isolation from the 
larger whole of which it is a part. 


4. A satisfactory measure of the learning of the individual can be obtained through 
some form of verbal test. 
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After elaborating these and other points, Hopkins (188:204) concluded: 
“These assumptions contain no relationship to the internal balance of the 
individual ; they do not consider the adjustment or maladjustment of the 
individual in his environment; they are not applicable to a variable learn- 
ing situation in which uniformity has been removed both from the materials 
and the controls. . . . The problem of measurement . . . is to build new 
assumptions and technics that center measurement in the interaction of the 
individual with his environment and recognize desirable changes in each.” 

It would appear that there are at present forces which would support an 
advance in testing. These forces are expressing themselves in new concep- 
tions of the relation of testing to education, and in new conceptions of what 
education should accomplish. We shall examine these two areas before con- 
sidering directly the new tests to which they are giving rise. 


Lessening Emphasis upon General Standards 


The appearance of the new-type test, leading quickly to standardized 
forms, was a matter of sufficient moment to require a reconsideration of edu- 
cational theory and the development of an educational philosophy which 
would incorporate the new instruments into a unified program. The last ten 
years have afforded gratifying progress in this direction. 

In the absence of an adequate philosophy, and under the stress of increas- 
ing demands for efficiency in process and product, the natural thing to do 
was to emphasize standards. That was the emerging thought in industrial 
production; the ideology carried over into teaching. It dominated practice 
and thinking in the “twenties; it has prevailed in practice during the "thirties, 
and probably will in the "forties. In the literature, however, a new emphasis 
has been growing. 

Lee (193:465) stated that “many of the educational crimes which have 
been committed in the past twenty years have been due to a wrong concept 
of the purpose of measurement.” Beers (165:578) wrote that “there appears 
to be no lack of interest in the giving of the tests; but what to do with the 
results, once they have been assembled, is shrouded in black mystery for 
many teachers and school authorities.” We are well acquainted with the 
conventional uses that have been made of test results. 

Wood has been one of the most consistent critics of these conventional 
uses. Writing in the first cycle of the Review of Educational Research (Feb- 
ruary 1933), he made the somewhat shocking point that raising the scores 
of pupils on a test, through increased learning, was not the end to be sought, 
that for many pupils such an increase was immaterial, and for some of them 
it was undesirable. He was writing of algebra, but his philosophy was gen- 
eral. He criticized the widespread use of tests for administrative purposes 
—marking, promoting, classification, credit, admission, retention—and em- 
phasized the need for using tests for a continuous study of the individual, 
and for adapting both the curriculum and the instructional methods to the 


needs of that individual. Wood (228:9) wrote: “Tests should first of alltell — - 
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what a pupil should try to learn—not how he may be cajoled, persuaded, or 
insidiously coerced into learning. . . .” 

Lindquist (196:73) conceded that “undue emphasis upon average test 
results, upon school-to-school and teacher-to-teacher comparisons . . . may 
cause the teacher . . . to neglect the interests of the pupils, and to be con- 
cerned instead with subject matter objectives and with higher average scores 
for their own sake.” 

Other writers (182, 190, 200) joined in decrying the use of tests to en- 
force standards. It may be pointed out that the technical literature has con- 
tained much evidence, over a period of time, that the norms themselves, of 
various standardized tests, are open to considerable question. The trend of 
thinking seems not to have been influenced by such material, but rather by 


an analysis of purposes. 
Putting Our Knowledge of Individual Differences To Work 


Full realization of differences between individuals may be regarded as 
a byproduct of measurement; probably the incorporation of the facts of 
such differences into a working philosophy of education will be accom- 
plished at the same time that the potentialities of measurement are assimi- 
lated. The following excerpts suggest certain uses of test results as con- 
tributing to, and as consequent upon, an understanding of individual 
differences. 

Charles W. Eliot, former president of Harvard University, is quoted 
by Wood (229:229) as saying: “Uniformity is the curse of American 
schools. That any school or college has a uniform product should be 
regarded as a demonstration of inferiority. . . . Every child is a unique 
personality. . . . Uniform programs and uniform methods of instruction 

. . must be unwise and injurious—an evil always to be struggled 
against.” That was in 1892. 

Speaking of standards as something to be gotten away from, in the 
direction of individual adaptation, Wood (228:9) himself said: “Even 
if some do learn the prescribed minimum under the pressure of ‘remedial’ 
treatment, the results might not be worth the effort. Indeed, if we consider 
the attitudes of despair, the feelings of inferiority, the habits of depend- 
ence, the frequently temporary and superficial, if not fictitious, character 
of forced learning, and the loss of opportunity and time for learning some- 
thing that is within the comprehension and interest of the pupil, it is not 
by any means certain that the efforts to ‘remedy’ children up to prescribed 
minimum are not positively harmful.” 

Speaking on the positive side, the same writer (229: 233, 234) claimed: 
“We have no right to ask or encourage any pupil to learn a subject unless 
we have reasonable grounds for believing at least two things: first, that the 
pupil has the necessary ability or capacity to learn that subject; and 
second, that learning that subject will, all things considered, tend to make 
him a better and happier citizen more surely than would anything else that 
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he might do with his energies at that time and place. . . . The teacher’s 
duty to learn the child is prior and paramount to the duty to teach the 
child. . . . The objectives of education in our public schools are accord- 
ingly: first, to try to ascertain the intellectual, personal, and social needs 
of each individual child; and second, to try to meet those needs, whatever 
they may happen to be.” 

Tyler made the same points: “In appraising the progress of each student 
adequate consideration should be given to his individual pattern of de- 
sirable educational goals. The objectives of the course do not represent 
points to be reached by all students but rather directions in which students 
may progress. . . . When objectives are conceived as uniform goals to be 
attained by all students, teaching tends to become an attempt to maintain 
a lock-step march to these goals, while testing is used to discover whether 
the students have reached the goals. Such a conception omits the vast array 
of facts regarding individual differences. Individuals differ not only in 
rate and methods of learning but in interests, needs, and potential abilities. 
How far each student may be expected to progress toward any objective 
varies with his needs, his interests, and those abilities of his which are 
involved in this progress. . . . In this sense, objectives become individual- 
ized as do teaching and learning procedures. 

“It is much easier to accept the median achievement of a group of 
students as the goal for each person than it is to try to formulate a suitable 
individual pattern of goals... . The proper conception of evaluation 
eliminates purely mechanical appraisal and substitutes judgment and 
thoughtful consideration. This does not imply intuitive appraisal but 
demands valid judgments based upon the careful collection of compre- 
hensive evidence regarding student progress” (187:13-14). For more 
detailed statements one may read Voelker’s suggestions (225). 


Adjusting the Learning Load 

Connor (174:291) reported evidence that pupils who are working 
beyond their capacity are the chief sources of behavior troubles. “The 
pupils who showed the higher incidence of behavior difficulties also showed 
higher achievement in relation to mental ability. In other words, the slow 
pupil, working with a curriculum somewhat too difficult for him, tends, 
first, to respond by excessively hard work and relatively high achievement, 
and later, by other forms of less satisfactory social behavior calculated 
to gain attention.” When the learning load was adjusted to the ability of 
the pupil, a more even development of character and personality through- 
out the pupil population resulted. 

In attempting individualization, Cook (175:473) reported the deter- 
mining of capacity by other than intelligence tests. “Intelligence tests are 
not an integral part of the testing program in the laboratory schools [of 
the Eastern Illinois State Teachers College], because no purpose has been 
found for them that is not better served by tests of achievement or of 
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specific aptitude.” Cook recognized eight areas of special capacity, and 
attempted to obtain measures of capacity in each of these rather than to 
use a general intelligence test which represented some sort of an average 
of all the capacities, with an unknown weighting. 

Wood (229) listed ten criteria for individualized instruction. The litera. 
ture written by those not intimately connected with the testing movement 
contains many articles urging greater individualization of instruction. 


Codified Statements of Testing Purposes 


In the preceding cycle of the Review of Educational Research Sangren 
(216) listed six purposes of testing: (a) prognosis, (b) surveys, (c) 
diagnosis, (d) instruction (including grouping and marking), (e) ex- 
perimentation and research, and (f) guidance. Raths (212) gave the 
following five purposes: (a) appraising specific types of achievement for 
individuals, for groups, and for schools; (b) reporting to children, par- 
ents, other educational institutions, and to employers; (c) appraising 
continuously the methods and materials employed in teaching; (d) ob- 
taining a comprehensive picture of an individual or of a group; and (e) 
experimentation. The list is not exhaustive. 

Jones (190), writing on achievement in literature, mentioned six pur- 
poses for evaluation: 

1. Assisting the individual student more effectively to achieve, through his study 
of literature, the larger educational objectives with which he has identified his own 


purposes. 

2. Making the student conscious of desirable objectives of education which may be 
enjoyably achieved through the study of literature, but with which he has not yet 
identified his own purposes. 

3. Giving the teacher a clearer understanding of the student’s present personal and 
social needs, to the end that he may more competently direct that student’s future edu- 
cation through literature. 

4. Discovering to the teacher the general backgrounds and abilities of a given class 
to the end that he may select with greater appropriateness the literary materials which 
will prove most effective in achieving the classwide objectives of education. 

5. Giving the teacher a sounder basis upon which to make recommendations ‘o 
others for the future or supplementary direction of the student’s education. 

6. Furnishing the teacher with tangible evidence that will be useful in explaining 
the student’s literature program to interested parents. 


This list suggests that the teacher is not simply teaching literature, but 
that he is striving to do his part in contributing to a general education, 
and is using literature as the medium. Measurement aids him in specific 
ways. 

Cook (175) gave a somewhat elaborate list, the main points of which 
were: (a) to redirect the curriculum emphasis; (b) to provide a basis 
for educational guidance of pupils; (c) to encourage pupils to put forth 
their best efforts; (d) to direct and motivate supervisory efforts; (e) to 
provide a basis for the marking and promotion of pupils; and (f) to 
build and maintain desirable skills, abilities, and understandings. “Tesi 
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periods are considered as very effective learning periods as well as testing 
periods.” Subpoints given under (b) are: predicting pupil performance; 
classifying pupils; diagnosing learning difficulties; setting up standards 
of pupil performance; discovering special aptitudes; discovering pupils 
in need of guidance and individual consideration; and measuring pupil 
achievement. The uses include aptitude, diagnostic, and general achieve- 
ment tests; intelligence tests, as previously stated, are not regularly used. 

Lee (193) listed five uses of intelligence tests, six uses of informal 
(teacher-made) tests, and eight uses of standardized tests. Lee and Segel 
(195) obtained questionnaire responses from 1,600 high-school teachers 
and reported that only two purposes were each listed by more than half 
of the teachers—to aid in determining marks and to discover topics that 
need to be retaught. Other uses which were frequently mentioned are: 
(a) to discover the quality of work a pupil should do; (b) to discover 
what topics should be taught; (c) to stimulate pupils to do better work; 
(d) to evaluate strengths and weaknesses of instruction; (e) to aid in 
determining the future educational program of the pupil; and (f) to 
classify pupils into ability groups. 

Beers (165:579), writing primarily for the college level, said: “The 
majority of 311 colleges reporting on their most valuable experience with 
tests cited vocational guidance as first. . . . Test data furnish evidence 
for gauging the amount of class work to be carried, for encouraging 
superior prospects in undertaking senior college work, for making scholar- 
ship recommendations, for determining the amount of work for self- 
supporting students, and for stimulating both faculty and students. These 
and other uses indicate that testing and guidance, far from being a mere 
formality, serve a much felt, practical need.” Referring to a certain col- 
lege, he added: “Fitting education to the individual and not the conven- 
tional reverse procedure is looked upon as a major responsibility of the 
faculty.” A digest of test uses (163) has been prepared, covering the uses 
reported by colleges. 


Relation of Testing to Teaching 


Cook (175:470) stated: “Tests are both powerful and dangerous instru- 
ments. . . . Every testing program has far-reaching effects on the cur- 
riculum, on the objectives and methods of instruction, and on the study 
habits of the pupils. . . . Since systematic testing tends to focus instruc- 
tional effort upon the characteristics measured by the test, it is highly 
important that these include all the desired outcomes in instruction.” 

Lee (193:466) reported: “The writer has had the opportunity to observe 
the effects of two types of testing programs. In one case tests of facts and 
skills were given at the end of the year by a state department of education. 
In the other case tests of skills were given at the beginning of the year and 
were selected by the local school authorities. The first procedure resulted 
in cramming by the pupils and in much cheating on the part of teachers. 
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In the system that gave tests at the beginning of the year, the teachers were 
vitally interested in analyzing the needs of each pupil. There was no 
feeling that the teacher was being judged; the attention of everyone was 
focussed upon the needs of the child.” 

Realizing that the customary interest in comparing averages of different 
classes has consequences which “are unfortunate, particularly because 
they create in the classroom teacher an attitude toward the tests which 
directly interferes with the effective use of the results in pupil guidance.” 
Lindquist (196: 72, 74) nevertheless feels that “these must be recognized 
as quite legitimate uses of test results,” and regards the matter as a prob- 
lem of the administrator and not of the test-maker. Elsewhere he stated 
(197-484): “The standardized test should be looked upon solely as a 
measuring instrument, and not as a teaching instrument or as an abbrevi- 
ated course of study.” 

Stalnaker (219: 38) stated: “Fruitful theory may result from consider- 
ing teaching and measurement as completely separable and independent 
functions. But practically tests do have a direct bearing on the curriculum. 
The English test which has a bad influence on the teaching of English is 
not a good test.” 

Raths (211: 91) wrote: “Emphasis should be placed on the fact that 
evaluation is not something that comes at the end of teaching. . . . The 
formal tests are to be given neither at the end nor at any specified time. 
but should be an integral part of the teaching. They are given for the sole 
purposes of diagnosing student difficulties, measuring growth, and afford- 
ing teachers an opportunity to do guidance work. The exercises included 
in any one particular test are either similar to the ones students meet in 
their daily school experiences or of the kind that serve as a highly reliable 
index of the achievement of students in the everyday classroom situations.” 
This position emphasizes an instructional viewpoint, in contrast to an ad- 
ministrative one. 

Foreseeing the ultimate effects of a test on teaching, McCall and his 
collaborators (198: 424) reported special consideration of this point. 
They said: “It was important, also, that the test be scrutinized for the 
influence of every question upon those who would read it or give it or 
answer it—child, teacher, superintendent, school board, publicist—that 
the test be filled with suggestions for better aims, better methods, better 
activity, and that the language of the test suggest concrete ways of becom- 
ing better members of society, better friends, better thinkers and appraisers, 
better leaders, followers, and cooperators, better learners and teachers.” 

Brownell’s regard (168) for the influence of test content upon instruc- 
tion is indicated by the criteria he would employ in selecting tests: (a) 
Does the test elicit from the pupils the desired types of mental process? 
(b) Does the test enable the teacher to observe and analyze the thought 
processes which lie back of the pupils’ answers? (c) Does the test encour- 
age the development of desirable study habits? (d) Does the test lead 
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to improved instructional practice? (e) Does the test foster wholesome 
relationships between teacher and pupils? The last three of these are seen 
to stem directly from an interest in the reaction of the test on instruction, 
and the first two contribute to keep the teacher from being misled by the 
results. Brownell further said: “It should be stressed that in the practical 
enterprise of educating children, teaching and learning are inseparably 
united with measurement. ... A test is good if it furthers sound, eco- 
nomical learning and advances the quality of instruction” (168: 485). 


Review of Purposes of Classroom Testing 


The multitude of varying claims and statements represented by the ma- 
terial cited present a problem of interpretation. The classroom teacher 
recognizes the commonsense necessity for carrying his pupils along far 
enough in their learning so that they can fit into the system of schooling 
at the end of the year. He is deluged with statements in the literature that 
goals, curriculums, and progress should be individualized and broadened 
far beyond the usual limits assigned to subjects. He cannot do everything, 
or please everybody. How shall he chart his course? 

The answer, as indicated earlier, awaits the crystallization of an educa- 
tional philosophy which is adequate for the presentday problems of edu- 
cation. The ultimate resolution of the conflicting interests cannot be clearly 
foreseen. It may be, however, that our thinking will sooner or later settle 
upon lines somewhat according to the following. 

We have been thinking of education far too simply. We cannot continue 
long to ignore the facts of differences in background, learning rates, and 
ultimate learning levels of children, or the host of factors in their environ- 
ment which affect (retard, accelerate, direct, or complicate) their learning 
day by day, and which may abruptly terminate their formal schooling 
before they have reached the limits of their mental capacity. Nor can we 
slight the complicating emotional factors that result in, and result from, 
formal learning difficulties. Such differences have been attested thoroughly, 
over a considerable period of time, in the educational, psychological, and 
sociological literature. 

True, schools have tried a great many adaptations in the light of these 
differences. Teachers and administrators who are close to the situation rec- 
ognize the great differences, and make some concessions to them. But the 
facts have not yet worked themselves through a consistent philosophy. It 
is still conventional for a school survey to compare a city school system 
with “the norm.” Administrators still compare schools and classes and 
teachers on the basis of test results. Only recently the schools of a large 
city were listed in the newspaper in order on the basis of their average 
reading scores in a particular grade. We recognize differences in children 
when we center attention on them, and promptly drop the facts from our 
minds when we think of closely related matters. We still think of education 


far too simply. 
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If we are serious in suggesting individualization of goals, of rates of 
progress (they occur in spite of us), and of learning experiences, the 
administrator will have to drop entirely the whip hand he holds over the 
teachers in terms of comparing their classes with “the norm,” with other 
classes, or with any other arbitrary quantity. He will have to obtain his 
evidence on the efficacy of instruction from other sources; or he will have 
to draw his conclusions from a multitude of different kinds of tests, cover- 
ing a variety of outcomes such as administrators seldom think of realis- 
tically. Instead of being content with one factor—achievement—he must 
obtain evidence on a variety of forms of capacity, on a variety of previous 
and current environmental factors, on a variety of personality (emotional, 
attitudinal, and other) traits, and, in addition, must expand the evidence 
on achievement to cover a breadth of developmental aspects such as we 
are only slowly able to analyze out of the totality of what constitutes a 
well-educated person. And it seems probable that these many lines of evi- 
dence will need to be expressed as amounts of change, rather than as cross 
section values. An appraisal of teaching or learning that is fair to teachers 
and to pupils is probably too complicated for an administrator to make 
on the basis of test results. He should seek other means (182, 226). 

Once administrators, including school surveyors, sense the complexity 
of education, the teacher will be freer to work out an instructional phi- 
losophy. The time has come when we should cease to be primarily inter- 
ested in comparing one child with another, one class with another, or any 
class with a norm. We should be primarily interested in comparing each 
child with himself, with his past record, and with his potentialities. To 
center attention elsewhere is to miss the point—to miss the service which 
tests can render. If lateral comparisons are made as a secondary matter, 
they should only be given consideration when there is evidence on enough 
other factors to warrant a conclusion. Such evidence is not easily obtained; 
it requires much more than formal testing. 

Large city school systems enrol pupils with all degrees of capacity, liv- 
ing in all sorts of favorable and unfavorable environments, with all sorts 
of personality mechanisms. Where is the public school system that has 
frankly and honestly worked out its course of study in the light of these 
differences; that suggests to each grade teacher the variations in each 
subject that should be made for pupils of different capacity groups, having 
different backgrounds and different prospects; that furnishes differentiated 
norms for such pupils, so that the teacher can use them as a guide to 
check himself and the pupil against; and that holds up before teachers 
the ideal of a broad education, as contrasted with the type of achievement 
called for by a common test battery? We cannot expect the teacher to solve 
all of these problems for himself and do a perfect job of it. 

Many of the writers on individualization and guidance, in the testing 
literature, are concerned with the high school and college. It is easier to 
“guide” a pupil away from a course in plane geometry, than it is to “guide” 


532 















































December 1938 THE IMPROVEMENT OF CLASSROOM TESTING 





him through the tool subjects. The elementary-school teachers face a dif- 
ferent problem. Their problem is to see that the child attains a level of 
ability in reading, arithmetic, spelling, language, and fundamental under- 
standings in the social studies that will be regarded by him in his later 
life, and by society, as representing the best distribution of learning oppor- 
tunity in the elementary school. The tool subjects cannot be slighted; but 
neither should pupils be very greatly worked beyond their capacity, and 
neither should slow pupils be denied variety and breadth. 
Individualization calls for a restudy of what should be regarded as min- 
imum essentials in the tool subjects for each different capacity group, for 
each different number of years in school. Such curriculums must be viewed 
vertically, from the primary grades up, instead of being thought of as 
minor departures from the standard curriculum for any grade. And edu- 
cators must look through the eyes of the dull child and of the bright child, 
on their present worlds and on their future worlds in conceiving the proper 
curriculums. 

From such a perspective the ordinary use of the commercially available 
standard test, with an emphasis upon comparisons of one pupil with an- 
other, and of the class average with the norm, appears for what it is 
worth (200). Viewed in the light of the broadening objectives which 
educators are coming to recognize, our conventional procedures dwindle 
still further in significance. 

We still think about education in too simple terms, and so long as we 
continue to do so, testing will furnish results of doubtful benefit. 


Broadened Conceptions of Educational Objectives 


“Probably no other factor in modern education has had more to do 
with a re-definition of the aims of education than has the testing move- 
ment. . . . Educators have re-defined the objectives of education in social 
terms, and some of them—particularly the supervisors of instruction—have 
demanded that the orgy of testing for knowledge and skill cease.” Thus 
wrote Connor (174: 290). It was pointed out earlier that educational prac- 
tice was apparently changing, and that test design and use was following. 
We shall here consider the apparent changes in objectives of education, 
some emanating from an advancing curriculum philosophy and some forged 
by the necessity of adapting testing instruments to a greater variety of 
important outcomes. 

Leary (192) reported an analysis of 1,660 recent courses of study, indi- 
cating that 66 percent of them include among their objectives “the devel- 
opment of desirable attitudes, appreciations, and understandings.” Samples 
of some of the objectives are quoted. The findings are heartening. 

The Evaluation Staff in the Eight Year Study has divided desirable 
outcomes into the following ten areas (211:90): (a) aspects of reflective 
thinking; (b) interests, aims, and purposes; (c) attitudes; (d) social 
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information; (h) appreciation; (i) social sensitivity; and (j) functional 
philosophy of life. These areas are stated somewhat more fully. They are 
given elsewhere by Raths (212, 213) as eight areas, with (b) and (h) 
combined, and (d) and (i) combined. Wrightstone (235) listed six ob- 
jectives and Reene (215) four objectives which would fall under the above 
categories. 

New York State recently adopted a revised list of cardinal objectives 
(203, 233) for elementary education. The list contained six objectives, 
stated in broad terms. Eginton (179) listed 21 areas, each with subdivi- 
sions. McCall and others (198) included 19 different areas in a test they 
recently constructed, which is only one of a set of four parallel tests. 
Other writers also contributed to statements of objectives which should be 
measured (182, 224). 


Examples of Newer Measuring Instruments 


The expanded objectives have afforded a challenge to a number of work- 
ers in the testing field, and a variety of means of obtaining evidence has 
been devised and experimented with. In attacking the problem, these 
workers have not been limited by preconceptions of what form the instru- 
ment should have, but have taken recourse to various means, such as 
anecdotal records, checklists, ratings, observer-diary records, question- 
naires, informal reports, and interviews. Some of these instruments or 
procedures are ready and available for general use; others are still in 
preliminary stages. As illustrations of developments, the following refer- 
ences are given. 

Raths (212) listed likely means of obtaining evidence on the ten areas 
of the Evaluation Staff. More recently, Raths (213) described the applica- 
tion of ten tests to a school system, giving illustrative samples from the 
tests. In another place, he (211) reported that instruments are available 
for help in the evaluation of certain aspects of all ten of the fields pre- 
viously outlined. He described in considerable detail five of the tests. 
Further discussion of the development of these tests is given elsewhere 
(176, 214). Other contributions of the Evaluation Staff are given in refer- 
ences cited in the section which follows. 

Wrightstone (234) showed how data can be gathered on each of the 
six new objectives of elementary education adopted by New York State. 
He (232) also reported the application of new instruments to conventional 
and experimental schools. Diederich (177) listed a variety of observa- 
tional records and other report forms useful in gathering evidence. Zahn 
(237) described values of the anecdotal record. Hopkins (188) listed a 
number of records that should be kept. 

Various new developments in testing were reported by Alschuler and 
Hattwick (162), Barthelmess (164), Eberhart (178), Buckingham and 
Lee (169), Ginsburg (184), and Ralph (209). Ellingson (180) reported 
on the benefit to the faculty of working on the development of an art 
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scale. He said that “the most significant single contribution of this experi- 
ment” is the “major change in our attitude toward our own objectives.” 
Stalnaker (219) discussed developments in measuring English composi- 


tions. 
Evaluation in the Eight Year Study 


As already suggested, the Evaluation Staff, under the leadership of 
Ralph W. Tyler, has been active in stimulating teachers to analyze and 
formulate their objectives in specific, concrete terms, and also in producing 
instruments for evaluating pupil growth along the lines of these newer 
objectives. The refinement of objectives and the corresponding instrumental 
developments have been covered in the two preceding sections. It remains 
here to indicate briefly the structure of this organization. 

Five years ago 280 colleges and universities agreed to waive the usual 
entrance requirements for graduates from thirty selected secondary schools 
for a period of eight years, as an experiment. One of the conditions of the 
agreement was that during this period the secondary schools would develop 
means of obtaining and transmitting to the colleges information about each 
student so that the colleges would be able to understand the needs of the 
student and to provide satisfactory guidance for him. The administrative 
aspects and conditions of the experiment have been described by Aikin 
(161) and others (182).? The problem of evaluation, and the steps taken, 
have been outlined in various articles, by Tyler (222, 223), Raths (212, 
214), and others, and are evident in various progress reports of work 
(167, 207, 211, 213, 221). 

The philosophy of the Evaluation Staff is set in Tyler’s words (222: 
413) as follows: “Evaluation is not limited to the giving of examinations. 
It involves the collection of any pertinent evidence which indicates the 
degree to which the school is attaining its objectives; that is, the degree 
to which the desired changes in pupils are actually taking place. . . . In- 
struments of evaluation include observations of pupils, records of their 
activities, products which they make, tests which they take, and other 
procedures for noting their reactions and their development. The kinds 
of appraisal instruments needed depend upon the kinds of changes in its 
pupils which the school seeks to facilitate—that is, upon its objectives.” 

Elsewhere Tyler (187: 10-11)-wrote: “The customary method of ana- 
lyzing a course as a preliminary step to making examinations has been 
to analyze only the content of the course. The definition of objectives in 
terms of expected behavior differs from the analysis-of-content method. 
. . . On the usual basis of test construction it would be assumed that the 
student is expected to remember these descriptions. An examination would 
then be constructed which would disclose whether or not the student re- 
members the details of these experiments. In contrast, a definition of objec- 
tives in terms of student behavior does more than indicate the content to 


2 Numerous articles on the project are listed in the Education Index, since 1932, under the head, . - 


“Progressive Education Association, Commission on the Relation of School and College.” 
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be covered. It defines the reactions which a student is expected to make 
to this content.” This distinction, which permeates the work of the Evalua- 
tion Staff, is one of the notable contributions to modern testing. It removes 
the concept of testing from the confines of memory responses; it directs 
attention to the broadened field of educational objectives as definitely as 
attention heretofore has been centered on memory. This emphasis, to- 
gether with the statement previously quoted concerning the variety of 
instruments contemplated in an evaluation program, should go a long 
way in removing the distrust voiced by McGaughy (201: 380): “Most of 
the things that can be measured by our present tests, or any that can be 
constructed for objective use in the future, are relatively trivial and unim- 
portant in the program of a good elementary school.” 

The work of the Evaluation Staff has been outstanding for the sincerity, 
the courage, and the resourcefulness with which intangible objectives have 
been defined and means of measuring them sought. The evaluation study 
may well prove to mark a turning point in the history of educational 
measurement. 

Summer workshops—The Evaluation Staff has called in teachers singly, 
or in groups, from the various secondary schools to work at the head- 
quarters office. In July 1936, the summer workshop was inaugurated at 
Ohio State University, to which teachers from the thirty secondary schools 
in the Eight Year Study were brought for discussion of evaluation. At 
that Workshop, in the one held in Bronxville in 1937, and in the five 
Workshops conducted in the summer of 1938 (in Bronxville, Nashville, 
Ann Arbor, Denver, and Mills College), new-type tests were worked out 
cooperatively by teachers and test technicians (207). Plans for 1939 con- 
template the affiliation of these workshops with large universities. The 
work has been sponsored by the Progressive Education Association, with 
the aid of grants of funds from the General Education Board. 


Testing at the College Level 


The colleges have been active in developing and trying out new testing 
instruments and procedures. Much of this work is reported in references 
already cited in connection with earlier topics in this chapter. We shall 
here call attention only to a few publications representative of the work 
going on. In a report edited by Gray (186) five institutions report their 
examination programs, and the methods of improving examinations are 
described in separate chapters by three institutions. An appraisal of the 
work, uses of results, and needed research, are all dealt with. The coopera- 
tive test service is described. Other reports of college testing are made 
by the American Council on Education (163, 187), Bergstresser (167), 
Cheydleur (172), Eurich (181), Gerberich (183), Gore (185), Kent 
(191), Oppenheimer (204), University of Chicago (173), University of 
Minnesota (202), Valentine and Wenrick (224), and Wert (227). 
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CHAPTER V 


Developments in Test Scoring and Analysis’ 


FRED P. FRUTCHEY 


The WIDESPREAD USE of tests as a basis for understanding the needs of 
pupils has resulted in deep inroads on the teachers’ time, or in considerable 
expense and delay, largely because of the time-consuming task of scoring 
the papers and analyzing the results. To meet this problem there have been 
a number of recent developments; earlier ones were reviewed by Lindquist 


and Maucker (253). 


Devices for Identifying Correct Responses 


Most of the devices which have been developed serve to identify rapidly 
the responses which are correct, as a preparatory step to counting. The 
“self-scoring” tests which appeared commercially some fifteen years ago 
were of this type. A number of portable machines for indicating on a 
special answer sheet which responses are correct, appeared some years 
ago, but have had little or no mention in the literature. Stenquist (265, 266) 
and his staff developed a procedure for doing this work on mimeograph 
machines. The pupil’s response is made on a sheet having different spaces 
for different answers to a question; this sheet is then run through the 
mimeograph machine, which indicates, by printing, those answers that 
are in the correct places. Stenquist (266) reported that, “after a few hours’ 
practice, operators can score, with sufficiently perfect precision and vir- 
tually no spoilage, at the rate of from 28 to 35 tests (answer sheets) per 
minute.” 

For hand use, Manuel and Knight (255) developed a stencil containing 
marks at appropriate spaces at the edges of windows. When the stencil 
is placed over the pupil’s test paper in proper position the coincidence of 
the pupil’s marks on the test and the stencil marks indicates correct 
answers, which the scorer counts. The chief advantage of this device 
over the usual scoring keys is that here the scorer needs to keep in mind 
the matching of only two symbols. 

Toops’ method—In connection with the Ohio State University Intelli- 
gence Test, Toops (271) developed a unique scoring procedure. This test, 
now in its twentieth edition, consists of a test booklet containing the 
questions, and an answer pad of three response sheets with a special guide 
sheet on top of them. These sheets are sealed together and the student makes 
his response by punching with a pointed stylus in the appropriate place, 
making his selection according to the guide sheet on the top. The second, 
third, and fourth sheets, all of which are punched simultaneously, have 
wrong-answer spaces covered with black so that the correct punches are in 


2 Bibliography for this chapter begins on page 564. 
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white spaces. These sheets are separated, after the testing, by trimming 
around the edges. 

The counting of the correct responses proceeds at the rate of about one 
test a minute, the test covering 150 questions and requiring about two 
hours to take. The advantage of the three different response sheets is that 
they can be given to different persons to score (count), thus providing a 
check on the work. The sheets may in fact be shuffled and passed out to 
the persons who took the test, since the response sheets contain only a 
serial number in place of the person’s name. Each person can count three 
sheets, which may then be reassembled and checked against each other 
for agreement. While the procedure requires a specially prepared set of 
material, it does not call for a special machine in connection with the 
scoring. 


The International Test Scoring Machine 


The most outstanding development in the scoring of objective tests is 
the Test Scoring Machine developed by the International Business Ma- 
chines Corporation (244, 247, 248). The imperative need for such a 
machine to facilitate large-scale testing was sensed by Ben D. Wood in 
connection with over 200,000 tests given in the Pennsylvania Study in 
1928. The necessity of extensive research in the design of such a machine 
soon became apparent, and the problem was presented to the International 
Business Machines Corporation. After two years of research, an experi- 
mental model was placed on the market in 1935. This model has been 
carefully studied and improved down to the present time. 

The machine is based on the electrical conductivity of a graphite mark 
of a lead pencil. Responses are made in spaces indicated on a specially 
printed response sheet; special pencils are advised, but are not a requisite. 
The response sheet is then dropped in the scoring machine and a pointer 
indicates the number of correct responses. The response sheets can be pre- 
pared in a variety of forms, adapting them to various types of question 
and test situations. Different weights may be attached to different answers, 
as for the Strong Interest Test. According to the Corporation, “the machine 
records the raw scores in terms of the number of right answers, number of 
wrong answers, rights minus wrongs, rights minus a fraction or multiple 
of the wrongs; or any of these scores may be recorded in terms of per- 
centage. Three part-scores and the score on the total test may be secured 
at one operation” (247:8). A unit for item analysis is being developed. 

This machine thus not only does away with the necessity of counting 
but also with the hand manipulation of rights and wrongs after counting. 
Operators of average ability are reported to score from eight to fourteen 
tests per minute. In the New York Regents’ Inquiry into the Character and 
Cost of Public Education, 402,600 tests were scored at an average rate of 
15.6 tests per minute. Other studies of the machine have been reported 
(262, 276). 
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Accuracy—Stray marks or light marks on the response sheets sometimes 
fail to register on the machine. Studies of accuracy, by check-scoring the 
papers, showed an error of 0.15 percent in the number of items, and an 
error in 2.6 percent of the papers (244, 276). 

Artificiality—lf a test scoring machine, or other device or procedure, 
requires a form of recording which is unfamiliar to the pupil, the responses 
will not represent his behavior (knowledge) under other more natural 
conditions. While this fact must be recognized as a possible criticism of 
the use of response sheets for this machine, one must also recognize that 
all paper-and-pencil tests contain certain elements of artificiality, and that 
the difficulty with the special response sheet lies more in its unfamiliarity 
than in its inherent form. While this matter needs further study, and 
special norms may be appropriate where such sheets are used, the matter 
does not appear from the present studies to be serious. A coefficient of 
correlation of .99 was found between the conventional method of scoring 
90 students’ free answers to a general mathematics test and the machine 
scores based on the special response forms. 


The Cost of Scoring 


Seates (259, 260) reported that the cost of ordinary scoring was 50 
percent of the cost of purchasing, giving, and scoring a standard test 
battery, where all operations were paid for, and 10 percent of the scoring 
was checked. Studies of the cost of counting identified correct responses, 
previously described, do not seem to be reported; but if Toops’ procedure 
were followed, the money cost would obviously be little, or nothing. With 
reference to the use of the International Test Scoring Machine, it has been 
estimated that its use saved $15,000 in the Regents’ Inquiry in New York 
State. One important factor in possible saving is the use of the response 
sheets in lieu of-marking up test booklets. One city school system reported 
saving enough on one order of tests to pay for the rental of the scoring 
machine for two years (244). 


Objections to Scoring Machines 


The development of test-scoring devices in the form of stencils and 
machines, particularly scoring machines, has met with some objections and 
criticisms. The chief objection is that a scoring machine is but a further step 
in mechanizing education. There is a tacit assumption that the machine 
interprets the student’s answers and relieves the teacher of thoughtful con- 
sideration of the individual student. Obviously the machine can only per- 
form a mechanical process and cannot interpret (262, 266). The degree 
to which a part of the process of testing is mechanical provides a place for 
the use of machines and hence for economy of time and labor. The most 
important parts of the evaluation process, however, involve human judg- 
ment and thought in constructing and administering tests, and in inter- 
preting the results. In the testing process human judgment enters, in de- 
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ciding for what aspect of development to test, what kind of evidence of 
behavior is indicative of that development, how to get a record of the 
behavior, how to bring together all bits of evidence in the test record for 
possible interpretations to be made, how to weight each bit of evidence 
numerically, how to interpret the numerical measures, and what further 
educational experiences to prescribe. These problems cannot be decided by 
a machine; but there is a place in the process which involves computation, 
and a machine can perform the necessary computations at this point in 
the process. 


Various Scoring Studies 


A number of studies have reported on special phases of scoring. Conrad 
(242) and Sims (263) dealt with the scoring of rearrangement tests so as 
to allow for chance. Zerilli (277) and Conway (243) discussed the scoring 
of multiple weighted items, such as the Bernreuter Personality Inventory. 
Klar (250) studied the rating of pictorial compositions; Strong (268) 
reported on the scoring of his interest test by the use of the tabulating 
machine; and Lawson (251) and Bush (240) reported respectively on 
scoring subjective tests and true-false tests. 


Statistical Analysis of Test Scores 


Hollerith tabulating machines have been used to a large extent in the 
analysis of test results on a large scale. Such analyses are reported by 
Wood (275) on college tests, by Lindquist (252) on high-school tests, by 
Strong (268) on vocational interest tests, by Terman and Merrill (270) 
on intelligence tests, and by Kelley (249) on free association test results. 
These reports appeared in a general book on applications of the punched 
card method (239). 

Toops (272) reported on the use of the punched card method in the 
analysis of questionnaire data and methods of constructing questionnaires 
for machine analysis. He pointed out that the machine is a valuable 
stimulus to the researcher because he must think through the whole process 
in preparing data-recording forms, even to the point of deciding upon 
tables to make, and possible interpretations. He must decide ahead of time 
what to do with unexpected responses. According to Toops, “the machines 
act mechanically, blindly, and unintelligently—although with an accuracy, 
speed, and seeming intelligence for those operations for which they are 
set, which to the casual observer, appears superhuman, as indeed it is in 
fact.” 


Interpretation of Test Scores 


Stencils, or similar devices, for the interpretation of test scores have ap- 
peared in at least two publications. Allen (238) published his stencil in 
a separate volume, to be used as a part of a program of adjustment. The 
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chart is based on intelligence, achievement, and chronological age. More 
recently Voas (273, 274) presented reports of a somewhat similar chart, 
with a chart form designed to aid in the preparation of the chart. The same 
three basic factors or traits are involved. 

Teachers of mature judgment probably regard such devices as suggestive 
but dangerous. If all important factors were measured, and if all factors 
were measured accurately, mechanical devices for interpretation would be 
reliable. But teachers, as well as research workers, know that all measure- 
ments must be interpreted in the light of daily observation over a long 
period of time, and that the results of testing alone, construed in the usual 
sense, cannot be relied upon to furnish data which are the sole basis for 
educational guidance. All mechanical schemes for the interpretation of 
test scores, therefore, must be used against a background of enlightened 
judgment. 
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CHAPTER VI 


The Educational Measurement Movement in 
Perspective’ 


A. Developments in Educational Measurement 
E. F. LINDQUIST 


Some OF THE MOST SIGNIFICANT of recent developments in educational 
measurement have been neither the direct outgrowth nor the immediate 
object of experimental research. On that acount they have tended to be 
neglected in reviews of this kind. These developments may be briefly char- 
acterized as follows: (a) a greatly increased emphasis upon the use of 
tests as a means of facilitating individualization in education, or upon the 
use of tests in educational guidance; (b) a consequent demand for in- 
creased comparability in the results obtained from tests; (c) a steady 
growth in the number and scope of cooperative regional testing programs; 
and (d) increasingly successful attempts to measure what heretofore have 
been considered the intangibles in educational outcomes, both for the pur- 
pose of improved educational guidance and for more adequate evaluation 
of current outcomes of educational practices. 


Guidance 


The increased emphasis upon the guidance values of tests has been re- 
flected in the work of all leaders in educational and vocational guidance. 
The references to these uses are too numerous and scattered to permit any 
enumeration or individual summaries of them here. It is now generally 
conceded that one of the major functions of educational measurement is 
to enable the teacher, the guidance counselor, and the school administrator 
to become more intimately and dependably acquainted with each indi- 
vidual pupil, in order that more adequate provision may be made for 
individual differences in all phases of the educational program. This use 
of tests is now regarded by many as not only one of the major functions 
but as the major function of tests, to which all other uses should be defi- 
nitely subordinated and with which no other use should be permitted to 
interfere (283, 292). 


Comparable Scores and Norms 


In accordance with this emphasis, increasingly adequate provisions are 
being made in school practice for the organization and systematic accumu- 
lation of test results and other relevant guidance data on permanent cumu- 


1 Bibliography for this chapter begins on page 565. 
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lative record forms for individual pupils. Recent developments in this 
area were excellently summarized by Segel (290). These efforts to organize 
and integrate available information about individual pupils for more effec- 
tive interpretation and use in guidance have drawn attention to the need 
for greater comparability in test results. Unfortunately, most of the stand- 
ardized tests thus far have been independently constructed with little 
regard to the possibility of their collective use in an integrated guidance 
program. The norms for these tests have been independently established 
at different times and under different conditions, each for a group of 
pupils and schools differing in geographical distribution, in type of organ- 
ization, and in level of achievement from those used in the standardization 
of the other tests. Because of these variations, the norms have differed con- 
siderably, sometimes by as much as several grade levels, even for tests 
intended for the same subject. Consequently, when an educational profile 
was constructed on the basis of percentile or grade norms for a number 
of these tests, it was impossible to tell to what degree the peaks and 
troughs in that profile were due to real differences in the abilities of the 
pupil and to what degree they merely represented accidental variations 
in the norms provided for the tests. The urgency of the need for increased 
comparability in test results, from the point of view of educational guid- 
ance, was expressed by Wood (287). 

In general, high comparability in results for any set of tests can best be 
obtained by establishing the norms for all of these tests at the same time 
and under the same conditions, for exactly the same group of pupils and 
schools. It is here that the cooperative or regional testing program makes 
one of its most important contributions. Through such programs, it is not 
only possible to secure highly dependable and meaningful norms on each 
test individually, because of the size and homogeneity of the population 
used, but also to establish norms simultaneously on a large number of 
tests for the same population and to maintain comparability in these 
norms from year to year. In fact, it now appears that cooperative organi- 
zation in testing, such as that represented by the Cooperative Test Service 
of the American Council on Education or by the various regional testing 
programs, is the only practicable means of establishing norms of this 
type. This and other advantages of wide-scale cooperative organization 
in achievement testing were summarized in a bulletin (284) describing 
the 1938 Iowa Every-Pupil High School Testing Program and are dis- 
cussed also in the descriptive literature provided for many other programs. 
The growth of this cooperative movement is evidenced by the fact that 
some type of organized program is now in operation in some twenty-six 
states. It appears likely that this trend will continue, and that in the future 
the great bulk of all testing for guidance will be done through the regional 
testing program. 

In spite of the importance of the problem, relatively little research has 
been reported on this matter of comparability. A study by Crawford (279), 
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not hitherto reported in the literature, set forth convincing data on the 
operation of certain factors, such as chronological age, mental age, grade 
placement, and school progress in test norms. The results indicated a 
definite need for much more effective control of these factors in selecting 
the population for the determination of test norms. Crawford suggested 
the establishment of norms on groups selected for normality in each of 
the factors named above, and pointed to the need for much more highly 
refined norms in most fields of educational measurement. 

A scaling technic which is intended to control many of the variables 
ordinarily present in comparisons of test results is proposed by Flanagan 
(280). Scores expressed on a common scale are now provided with most 
of the tests published by the Cooperative Test Service. A score of 50 on 
this scale represents a score which the average child would make at the 
end of the particular course tested if he had attended an average school 
and had taken the usual amount of the subject in question. A Cooperative 
Test Service booklet containing a complete discussion and explanation 
of this system of scaled scores is now being prepared and should become 
available before the publication of this review. 


Measuring Intangible Outcomes 


One of the most encouraging of recent major trends is evidenced in 
the numerous, and increasingly successful, attempts to define more clearly 
and to measure more objectively the attainment of some of the more 
intangible educational objectives which have heretofore been neglected. 
A significant proportion of recent contributions of this type have come 
from the Evaluation Staff of the Commission on the Relation of School 
and College of the Progressive Education Association. By agreement with 
two hundred and eighty American universities and colleges, thirty sec- 
ondary schools preparing students for colleges have been freed from the 
usual college entrance requirements and entrance examinations, and have 
thus been able to introduce experimentally certain important modifications 
in their educational offering. The task of the Evaluation Staff has been 
to develop procedures by which the changes taking place in the boys and 
girls in these schools may be identified and by which each school may 
discover from year to year how well it is accomplishing its educational 
purposes. The essential features of this evaluation program were described 
by Tyler (291) as (a) the use of the major educational objectives as the 
basis from which the evaluation program proceeds, (b) a conception of 
appraisal which is not limited to tests and examinations, and (c) a cooper- 
ative activity in which individual schools working with an advisory tech- 
nical staff are developing new appraisal instruments where satisfactory 
instruments are not available. 

Some of the work of the Evaluation Staff has been reported by Wright- 
stone (294, 295, 296). It is hoped that some of the instruments produced 
in this Evaluation Study may soon be made available for general use. 
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Numerous other efforts to measure hitherto neglected educational out- 
comes have been reported in the literature. Frutchey (281) described a 
cooperative program developing the ability to use a scientific method in 
college sciences. Buckingham and Lee (278) reported on a technic for 
testing unified concepts in science. McDowell and Anderson (285) de- 
scribed a test of the ability of pupils to outline. Noll (286) discussed the 
measurement of the scientific attitude, and Grim (282) described a technic 
for the measurement of attitudes in the social studies. The students of 
Remmers (289) at Purdue have been particularly active in developing 
and applying scales for the measurement of generalized attitudes. 


B. Current Criticisms of Educational Measurement 
Ss. A. COURTIS 


The measurement movement in education always has been criticized 
and it is safe to prophesy that it always will be. It should be. 

At the turn of the century when the concepts of modern educational 
measurement were just being formulated, and survey measurement activi- 
ties were novel, misunderstandings and criticisms were inevitable. The 
center of emphasis then was almost wholly upon measurement of efficiency 
in the tool subjects of the elementary grades. Today measurement has 
spread upwards to the colleges and adult education (299, 300, 303, 305), 
inward to the measurement of “intangibles” (315), and downward to the 
preschool child (298, 301). Its purposes are as broad as science itself. 
But even today, as the influence of the movement reaches new areas and 
fields, the old conditions of novelty and misunderstanding are recreated 
in those fields. Fresh critics voice the limitations and deficiencies of 
measurement (297, 307), and in reply others reformulate, in modern 
terms, fundamental purposes and warnings (308, 311, 312). 

During the past three years opposition to measurement has not been 
much in evidence in the literature, but under the surface, among the rank 
and file, there are still dissatisfactions, as those in touch with teachers 
know (309, 310). It is, nevertheless, encouraging to find that out of one 
hundred and forty-three persons in administrative positions in our schools, 
less than ten considered the measurement problem urgent enough to feel 
that it should be discussed with parents (313). The really vital criticisms 
do not appear in print as such, but take form as efforts to achieve better 
tests, procedures, or statistical technics. The measurement movement as 
a whole progresses by these small steps of advance, each of which has 
motivating drive in some real dissatisfaction. 

Among the severest critics are workers within the movement. Thus, one 
leader in the field of factor analysis wrote: “A large majority of these 
papers involve misinterpretation of the factorial methods. . . . If the 
misapplication of factor methods continues at the present rate, we shall 
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soon find general disappointment with the results because they are usually 
meaningless as far as psychological interpretation is concerned” (314). 
Another statistician said: “Possibly a few would take issue with the first 
part of this battle cry (that anything that exists at all exists in some 
quantity) ; some would deny that all traits are susceptible of quantitative 
measurement; and many would agree that not all we attempt to measure 
exists. . . . The entire conceptual basis underlying factor analysis does 
violence to all that is known about the processes of growth and develop- 
ment. . . . When as sometimes happens factors wrung out of an analysis 
are interpreted as some kind of stable entities, progress toward personality 
measurement would seem to be impeded. . . . It will doubtless be neces- 
sary to resort to cumulative developmental studies” (302). One psychol- 
ogist fulminates: “From a certain point of view the history of mental 
testing is primarily a history of the idol worship of the parameters of the 
bilaterally symmetrical curve. . . . It is unfortunate that so many laymen 
have been so easily misguided into the faith that any wishful product of 
imagination converted into numbers by authoritative proclamation con- 
stitutes science and scientific method” (306). There are even measurement 
men who, on the basis of evidence conclusive to them, take the position 
that “no single test and no battery of tests of any type or description yields 
unambiguous information about the quantities educationalists wish to 
measure. . . . They (the present day tests) are not more adequate than 
were the measuring instruments of the alchemists and the astrologers” 
(304). 

Certain persons appear to be much distressed by such criticisms; but 
some criticism is an aid to healthy growth. It appears that more criticism 
is directed against the theory than the practice and becomes largely an 
intellectual matter. By way of analogy, we may note that in Berne, Switzer- 
land, and in many other cities of the Old World, fourteenth-century town 
clocks, whose machinery was designed on the basis of the now discredited 
Ptolemaic astronomy, still keep good time. So in education our tests and 
theories of measurement may be totally invalid, but it cannot be denied 
that many a teacher has been stimulated by their use to new effort, new 
enjoyment of his work, and new interests in his children. In spite of the 
opinions of the extremists who characterize the prevailing uncritical but 
practical use of measurement by schools as the grossest pseudo-science, 
such measurement may serve useful ends. To some extent we are all prag- 
matists; if we believe we can get benefit from an activity, we are likely 
to continue the activity. 

The danger is that we shall be too pragmatic. While continuing to do 
things which are useful, we must keep alive those critical faculties by 
which the conventional and the plausible may be unmasked and errors 
of direction detected. For in the long run, it is truth alone that enables 
man to extend his conquest and control of nature. 
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C. Past and Present Trends in Educational Measurement 
Ss. A. COURTIS 


The determination of trends is a matter of judgment and interpretation. 
Even the tabulation of frequencies of articles rests upon judgment as to 
the type into which any given article falls. Any value in the discussion of 
trends below must therefore be sought more in the suggestiveness of its 
interpretations than in its factual basis. 

The present major trends in educational measurement are judged to 
be five in number as follows: 


1. A trend toward standardization 

2. A movement away from the determination of laws 

3. A growing dependence upon statistical analysis and deductive reasoning 
4. An increase in observational personality and character rating 

5. A greater emphasis on longitudinal studies of individuals. 


Background 


In discussing present trends, some reference should be made to the 
contributions of the past. From Rice (1894) to Thorndike (1910), edu- 
cational measurement was little more than a novel and interesting varia- 
tion of the conventional examination—the attempt to adapt to education 
some of the methods and procedures used by Cattell (1885) and other 
early psychologists in the study of individual differences. Thorndike’s 
handwriting scale, the first calibrated educational ruler, supplied, not 
another novel examination, but something new—wnits of measurement 
based, supposedly, upon a universal law or principle (the Fullerton- 
Cattell Equal Difference Theorem). It was hailed with an enthusiasm which 
measurement men of this generation would have difficulty in appreciating. 
Finally, many believed, the day of exact science had dawned for educa- 
tion. At last precise knowledge, prediction, and control were to be had 
‘in return for scientific effort. 

The period from 1910 to 1920 was one of rapid growth and creativity. 
Binet’s concept of mental age was just beginning to influence psychological 
thinking, and education went “scientific” with a vengeance. Hillegas, Buck- 
ingham, Ayres, Trabue, Woody, Terman, and others discovered ways of 
extending basic concepts to other subjectmatter fields. Otis devised the 
group intelligence test that enormously extended the range of mental meas- 
urements. Tests and test-users multiplied at a rapid rate. The utilization 
of tests by the United States Army during the War served to bring meas- 
urement to popular notice and give it a prestige it might otherwise have 
taken many years to acquire. 

The decade from 1920 to 1930 was the period of the great depression in 
measurement. The novelty and the glamor of a new fad was over. Funds 
were increasingly scarce. Creative attention shifted to other fields—curricu- 
lum revision, the activity movement, the development of better school- 
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community relationships. School measurement activities continued as 
routinized reminders of the obligation of education to be scientific and as 
a fruitful field of theses subjects for candidates for academic degrees. Cor- 
relational studies multiplied; then a new type of creative activity came 
to the fore. Spearman’s emphasis in England on general and specific 
factors stimulated an entirely new approach to the problem of test con- 
struction, ably propagated in this country by Holzinger, Thurstone, and 
others. And thus a new upward surge or cycle of development came into 
being. It is from the perspective of these three periods that the trends 
in 1936, 1937, and 1938 are viewed. 


Standardization 


Bureaus of educational research found it necessary to standardize tests 
and procedures within the school systems they served. The same need ex- 
isted over larger areas, and a number of states developed statewide testing 
programs, some of which were remarkable for their consistent self-improve- 
ment through critical analysis. Nationally, the Cooperative Test Service of 
the American Council on Education and similar agencies (320, 324) are 
rapidly standardizing measurement instruments and procedures as much 
as it is possible to standardize them in a democratic country. The invention 
of mechanical aids to scoring is an aid in the same direction. 


Fundamental Laws 


Of the original impetus—the creation of a science of education in terms 
of law, prediction, and control—little remains. The language persists, but 
not the substance. From time to time new attempts are made to discover 
law or to establish units (329, 330, 332, 345), but such factors as the 
growing confusion of fact and theory in physics and astronomy, the in- 
creasing dependence on statistical procedures—justified in terms of logic 
not experimentation, including the complacent acceptance of correlation 
coefficients of 40-70 as indications of satisfactory prediction (327, 344) — 
the increasing separation between test-makers and test-users, are all devel- 
oping a widespread conviction that extraneous units, laws, and control 
in human behavior are neither feasible nor desirable. 

It is felt by some that measurement has failed of its early promise. 
In teaching and pupil administration, we are scarcely nearer a science 
of education today than when we started. The reason is that technics of 
measurement and statistical methods of analysis have had to be transferred 
from static fields to a dynamic one. In physics, the length of a bar of iron 
changes only as conditions change, and these can be largely controlled. 
Static measures and interpretations suffice. In education, however, the child, 
learning, and ability are living, growing entities. They are affected by 
subtle factors and change rapidly. They can be correctly measured and 
appraised only in terms of units and procedures adapted to living and 
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rapidly changing traits. It is significant that even the first attempts to 
move in this direction have been productive (321). 


Dependence on Deductive Reasoning 


The element which more than any other differentiates science from all 
other forms of human achievement is its acceptance of an authority outside 
itself as a court of last resort. The basis of the scientific method is the 
pragmatic empiricism of the cycle—experimentation, generalization, pre- 
diction, and verification. 

Mathematics, on the other hand, is deductive. Statistical procedures start 
from assumptions and grow by logical deduction. It is interesting to 
follow the derivation of a theorem from an assumption, particularly in 
a field where much experimental evidence contrary to the theorem is 
available (340). Often it is not possible to check the end product of de- 
ductive reasoning experimentally, and when it is done the result is often 
surprising (317). Increasingly in measurement articles, justification is 
based upon logical, mathematical reasoning instead of upon concrete 
experimental evidence. 

By the degree that such action is taken, the gap between the specialist 
in test and scale construction, and the user of tests in the classroom, is 
likely to become widened. Teachers cannot read and understand current 
discussions of item analysis, factor loadings, correlation pathways, and 
the host of statistical methods and proofs to be found in measurement 
articles (316, 331, 333, 334, 343), but they do sense the contrast between 
the promises of educational measurement and the actual inadequacies of 
tests in the classroom. Deductive reasoning has its place, but there is need 
to ask the question, Is the trend toward increasing statistical complexity, 
when unchecked by objective experimentation, a desirable advance or a 
menacing illusion? 


Measurement by Observational Rating 


Teacher antagonism toward measurement has frequently been based on 
the wide separation between the narrow products measured by the early 
tests and the objectives for which the teachers are working. Character, 
personality, and ideals are almost universally acknowledged to be higher, 
more important goals than elemental knowledges and skills. There has 
been an increasing trend toward some form of quantitative evaluation of 
the so-called “intangibles.” In this direction may be noted the time- 
sampling technic, the behavior rating scales, and other observational 
ratings (335, 338, 342), either direct or aided by objective devices such as 
the tests developed by the Eight Year Study of the Progressive Education 
Association (336, 337, 341). The influence of this trend has extended far 
and wide, two of the most notable results being the revision, on all educa- 
tional levels, of examinations (319) and of marking (318). 
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Longitudinal Studies 


The event of the period covered by this review has been the flowering, 
so to speak, of the Harvard Growth Studies (323) and others, and the 
publication of the monographs of the Society for Research in Child De- 
velopment (339). A new term has been coined, “longitudinal,” in contrast 
with “cross sectional,” measurement, and an idea long gestating has come 
to birth, namely, that interpretation of a child’s measurements should be 
in terms of his own growth curve and not in terms of norms derived from 
mass measurements. Longitudinal studies demand new types of records, 
new controls, new statistical methods. Although Baldwin and his succes- 
sors started collecting cumulative records on individual children many 
years ago, few persons even yet sense the implications and potentialities 
of this form of measurement. 

Mention should be made of the publications by Dearborn (323) and by 
Davenport (322) of their full raw data which makes available, to all, 
the wealth of information painstakingly collected over many years. 
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CHAPTER VII 
Recent Literature on Testing‘ 


DOUGLAS E. SCATES 


‘The purpose oF THIS CHAPTER is to list the recent systematic literature 
dealing with testing and to report technical references on test construction. 
The entire bibliography of the present issue of the Review of Educational 
Research is of course a presentation of recent literature on testing, and 
this issue may be regarded as one item in the list of bibliographies pre- 
sented below. 


Bibliographies of Tests 

The basic bibliography of commercially available tests at the present 
time is by Hildreth (352), and covers educational, intelligence, personality, 
and environment measures. Information on new tests has been kept up to 
date by Buros (346, 347, 348), both in a series of annual bulletins and in 
the monthly checklists appearing in the Education Index (350). This year 
Buros (348) inaugurated a test review service which is unique and should 
prove valuable. It is designed to render the same service with reference 
to tests as book reviews render for new books. 

Odell (356, 357) revised and brought up to 1936 his earlier test bibli- 
ographies. Catalogs of test depositories also are serviceable for current 
information. Other lists of tests will be found in the discussions referred 
to in the sections which follow. For descriptions of some of the newer 
testing procedures designed to gather evidence on some of the less tangible 
aspects of growth, one should consult the references cited in Chapter IV 
in the section on “Examples of Newer Measuring Instruments.” 


Bibliographies and Digests of Achievement Test Literature 


The bibliographies and discussions in the Review of Educational Re- 
search (358) on educational tests in the preceding cycle may serve as a 
starting point. South (360) prepared an extensive index of periodical 
literature on testing, which is arranged by author, with a subject index. 
Monroe and Shores (355) covered test bibliographies from 1910 to 1935, 
under such heads as “Test Construction,” “Testing Programs,” “Tests and 
Scales,” and cross references were given. Jones and Brown (353) in 1935 
summarized recent educational test literature in some detail. 

Swineford and Holzinger (361) published annotated selected references 
each year since 1933, adding the topic “Factor Analysis” for references 
published in 1936 and later years. The United States Office of Education 
(351) publishes annually its lists of research studies, which are predom- 
inantly master’s and doctor’s theses. Testing is covered under a general 
head “Tests and Testing,” with five subdivisions, in the table of contents, 


1 Bibliography for this chapter begins on page 568. 
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and mostly under “Tests and Scales” in the subject index at the back. Many 
of these references are annotated. 

The Education Index (350) must be regarded as the most important 
source of information on current testing literature. The principal heads to 
consult are: “Achievement Tests,” “Educational Measurements,” “Tests 
and Scales,” “Intelligence Tests,” “Personality Tests,” “Social Intelligence 
Tests and Scales,” “Behavior Tests and Scales,” and numerous cross ref- 
erences to other heads. Buros (347, 348) should be consulted for recent 
books on testing, including excerpts from reviews of books. The books and 
reviews on statistical and research methodology in the 1938 edition of his 
bibliography have also been printed separately (349). 

In the bibliography of Canadian education compiled by Smith (359), 
references to testing will be found under the heads “Educational Measure- 
ments,” “Tests,” and “Factor Analysis.” 


Literature on Intelligence and Personality Measurement 


Although theoretically this literature falls outside the scope of the present 
issue of the Review of Educational Research, a few summaries can be 
referred to briefly. The June 1938 issue of the Review (363) dealt with 
these areas, as did also issues in two earlier cycles. The works referred to 
above (347, 348, 350, 353, 355, 359, 360, 361) will yield references in 
these areas as well as in achievement if the proper sections or heads are 
consulted. 

In the field of personality and environmental measurement Symonds’ 
1934 publication (362) may be considered a basic work still, giving half 
of its pages to a description of data-gathering instruments. Maller (354) 
revised his earlier list of tests bringing them down to 1937. Traxler (364) 
printed a brief list, with discussion, and Vernon (365) did somewhat the 
same thing in England. 


Textbooks Dealing with Achievement Testing 


Books, textbooks and others, which have appeared in the field of 
measurement—achievement, intelligence, personality, and environmental 
—beginning with 1933, are listed, with reviews, by Buros (347, 348). We 
present here a list limited to textbooks in the field of achievement tests pub- 
lished since the middle of 1935. Buros may be consulted for reviews of 
these books. 

Greene and Jorgensen (367, 368) revised, expanded, and divided their 
earlier treatment so that separate books are now available for elementary- 
and high-school fields. The books contain a sufficient amount of material 
in common so that they can be used together in the same class, and can 
be divided according to student interest. Greene collaborated with New- 
kirk (371) in the preparation of a volume especially for industrial educa- 
tion. Lee (370) produced a textbook for high-school tests, and Orleans 
(372) and Rinsland (373) published general texts in 1937. Orleans em- 
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phasized use and Rinsland prepared the most complete assemblage of 
rules to date for making objective tests of various kinds. They are addressed 
primarily to the classroom teacher. The American Council on Education 
sponsored the production of a book on test construction (369) which repre- 
sents a composite of points of view. It is the most thought-provoking book 
yet to appear, though pitched somewhat above the level and range of inter- 
ests of typical graduate classes. Wright (375) prepared a “measurement- 
at-a-glance” type of outline. Smith (374) attempted to center attention 
on principles. In general, the textbooks of this period evidence a tendency 
away from the detailed description of commercially available tests. 


Literature on Statistical Methods in Test Construction 


Cureton and Dunlap summarized the literature on statistical contribu- 
tions to test construction and analysis in the June 1938 Review of Educa- 
tional Research. We present here a number of references which continue 
their bibliography (376-404). In view of the recency of their treatment, 
it does not seem appropriate to summarize the material at this time; it is 
therefore presented simply as a checklist. Present plans call for another 
summary of statistical material in the December 1939 issue of the Review. 
For a checklist of books, bulletins, and monographs dealing with statistics 
in general, Buros (347, 348, 349) may be consulted. 

This bibliography emphasizes statistical technic. For references on test 
construction which place less emphasis upon statistics, consult numbers 
(369, 373), and references cited in the last three sections of Chapter IV. 


Statistics in Test Development and Interpretation ’ 


The rapid strides which have been made recently in the effective use of 
statistical methods in the development, critical refinement, and more 
meaningful interpretation of educational tests are very encouraging signs 
of progress in educational measurement. Many competent mathematicians 
have come to recognize the contributions which their scientific training 
may make to theory and practice in educational measurement. The result 
has been a marked improvement in the training of research students in 
education. Today a rather surprising number of educators are qualified by 
training and experience to make critical and legitimate use of refined 
statistical procedures in the field of measurements. Psychologists have met 
their problems in a similar way, with the result that many significant 
contributions to the literature of measurement dealing with critical refine- 
ments and meaningful interpretations of test results have been made by 
those interested mainly in the psychological aspects of the problem. Thus, 
educators and psychologists, both being concerned with problems of 
measurement, are both concerned with the contributions which statistical 
technics can make to the solution of their problems. 


* Paragraph by Harry A. Creene. 
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Achievement, and _ intelligence, 499; 
studies of, 498; see also particular sub- 
ject field, prediction of school success, 
tests and scales 

Adapting instruction to pupils, 525, 542 

Algebra, achievement in, 504 

Arecdotal records, 515 

Arithmetic, achievement in, 504 

Attitudes, measurement of, 545 


Bilingual pupils, see nationality 


Checklist, 514 

Class size, 511 

College, see higher education 

College teaching, see particular subject 
field 

Composition, see English 

Cooperative testing, see regional testing 

Criticisms, of measurement, 545; of re- 
search, 497 


Diagnosis, 513; in arithmetic, 505 
Drawing, 500 


English, achievement in, 506; see also 
literature 

Essay tests, see tests and scales 

Evaluation Staff (of the Progressive Edu- 
cation Association), 514, 534, 544 

Experimentation, measurement in, 497 


Factor analysis, criticisms of, 545 
Foreign language, achievement in, 507; 
in homes, 499; see also nationality 


Genetic studies, 549 
Geometry, achievement in, 504 
Growth curve, 506 

Guidance, 542 


Handedness, see laterality 

Handwriting, 508 

Higher education, measurement, 497, 536; 
see also prediction of school success 

History, see social studies 

Hollerith tabulating machines, 540 


Individual differences, see adapting in- 
struction to pupils, variability between 
individuals 
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Individualized instruction, see adapting 
instruction to pupils 

Intelligence, and achievement, 499 

Interpretation, of test scores, 540 

Interview, for diagnosis, 505 


Laterality, 504 
Left-handedness, see laterality 
Literature, measurement of, 528 


Marks, 499 

Mathematics, see algebra, arithmetic, 
geometry 

Measurement, criticisms of, 545; history, 
547; in higher education, 497; in re- 
search, 497; influence on education, 
529; of intangible outcomes, 544; phil- 
osophy, 531; purposes, 528; trends, 
546; see also factor analysis, objec- 
tivity, regional testing, tests and scales, 
validity 

Motion pictures, 509 
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telligence, 499; and nationality, 510; 
predicting, 510 


Nationality, and language, 499; and 
music, 510; and reading, 503 
Norms, 542; use of, 525 


Objective tests, see tests and scales 

Objectives, broadened, 533 

Observation, 549; by teachers, 515; of 
pupils, 534 
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Poetry, 507, 511 

Prediction of school success, 500, 506 

Prevention, of achievement deficiencies, 
513 

Profiles, 543 


Reading, achievement, 501; diagnosis, 
503, 514; effect of typography, 503; 
in college, 503; and intelligence, 502; 
and nationality, 503; readiness, 501; 
remediation, 502, 514; and visual eff- 
ciency, 504; see also phonics 

Records, of pupils, 534; see also anecdotal 
records 
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Regional testing, 543 

Reliability, of essay tests, 521 

Remediation, 513 

Research, see experimentation, measure- 
ment 


Scoring, 537; 
tests, 521 

Sex differences in reading, 503 

Social studies, achievement in, 509; pre- 
diction, 510 

Sound recording, 498, 507; of tests, 498 

Spelling, measurement of, 507 

Summer workshops, 536 


machine, 538; of essay 


Tests and scales, by sound, 498; essay 
tests, 517; objective tests, 498, 518; 
studies of, 497; see also measurement, 
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validity, and particular subject field 
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Visual efficiency and reading, 504 
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Academic success, 17, 51, 247, 275, 500, 
506; see also particular subject field 

Acceleration, 251, 512 

Accidents, 266, 375; see also safety, 
emotions 

Accomplishment quotient, 244, 250, 308 

Accounting, bibliography, 200; financial, 
142; for supplies, 390; higher educa- 
tion, 146; internal, 144; property, 144; 
see also budgeting, business admin- 
istration, finance 

Achievement, and intelligence, 499; fac- 
tors related to, 19; studies of, 498, 554; 
see also academic success, tests and 
scales, particular subject field 

Adapting instruction to pupils, 35, 60, 
70, 525, 542 

Adjustment, surveys, 293 

Administrative units, 432 

Algebra, achievement in, 504; see also 
mathematics 

Anecdotal records, 515 

Apparatus, 380 

Appraisal, of buses, 428; of heating serv- 
ice, 373; of lighting, 401; of reports, 
159; of school buildings, 420, 425 

Aptitudes, see academic success, voca- 
tional aptitude, particular subject field 

Arithmetic, achievement in, 504; develop- 
ment of concepts, 236 

Art, aptitude, 263; bibliography, 75; 
measurement, 8; needed research, 10; 
psychology of, 7; teaching of, 7; value 
of, 7; vocabulary, 9 

Athletic ability, see motor, abilities 

Athletics, see health and physical educa- 
tion 

Attendance units, 126 

Attitudes, in American history, 72; meas- 
urement of, 46, 63, 269, 276, 312, 545; 
scientific, 63; social, 67; surveys of, 
298; see also racial prejudices, religious 
attitudes 

Auditorium, 382 

Automobile driving, 263; see also safety 

Aviation, aptitude, 265 


Bibliographies, accounting: financial and 
property, 200; applications of intelli- 
gence testing, 327; applications of tests 
of non-intellectual functions, 353; art, 
75; budgetary procedure, 199; char- 
acter education, 75; commercial sub- 
jects, 77; cost of school buildings, 480; 
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court decisions in the school plant 
field, 488; developments in statistical 
methods related to test construction. 
357; developments in test scoring and 
analysis, 564; education and psychol- 
ogy, 80; educational costs and their 
analysis, 201; educational measure- 
ment movement in perspective, 565; 
educational prevention, diagnosis, and 
remediation, 558; English language, 
reading and literature, 81; equipment, 
apparatus, and supplies, 471; essay- 
type test, 559; financial implications 
of school organization, 196; financial 
planning, 194; financial reporting, 202: 
foreign languages, 87; foreign school 
buildings, 481; health and _ physical 
education, 88; heating, ventilation, and 
sanitation in school buildings, 476; 
home economics, 90; improvement of 
classroom testing, 560; industrial arts, 
92; insurance, purchasing, and stores 
management, 205; intelligence tests, 
318; mathematics, 95; music, 96; op- 
eration and maintenance of the schoo! 
plant, 466; personality and character 
measurement, 340; plant development 
for higher education, including junior 
colleges, 483; pupil transportation 
equipment, 485; recent literature on 
testing, 568; salary scheduling, 204; 
school illumination, 477; science, 97; 
social studies, 98; state studies of local 
school units as related to the school 
plant, 486; status of research in the 
school plant field, 490; studies of edu- 
cational achievement, 554; support of 
education: federal, state, and _ local 
funding, 206; support of education— 
major problems, 190; technics of school 
building surveys, 483; tests and studies 
of infants and young children, 320; 
trends in school architecture and de- 
sign, 487; vocational aptitude tests, 334 

Bilingual pupils, see nationality 

Birth rates, 243 

Blackboards, 379, 412; lighting, 380 

Bonds, court decisions, 453 

Bookkeeping, teaching of, 17 

Budgeting, 120, 133; bibliography, 199; 
evaluation, 134; see also accounting, 
business administration, finance, plan- 
ning 

Buildings, see school buildings 

Business administration, 146, 379; evalu- 
ation of research, 187; needed research, 
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187; see also accounting, budgeting, 
finance, purchasing, transportation of 
pupils 

Business management, 391; see also ac- 
counting, equipment, inventory, main- 
tenance, operation, purchasing, school 
buildings, supplies 


Cafeteria, see lunchrooms 

Causation, 310; see also correlation 

Capital outlay, 122; and size of admin- 
istrative unit, 432; court decisions, 453; 
federal support for, 179; for higher 
education units, 458; state support for, 
183; see also debt service, finance 

Character, bibliography, 75; education, 
1l, 40; measurement of, 269, 302, 340; 
see also moral behavior, social adjust- 
ment and behavior 

Cheating, 11, 13, 285 

Checklist, for behavior, 514; for build- 
ings, 422; for fiction, 30; of building 
costs, 411; for economy, 374; for main- 
tenance, 378; for safety, 375 

Child psychology, see preschool children 

Civie training and attitudes, see social 
studies 

Class size, 511 

Cleaning, see blackboards, floors, laundry, 
operation of school plant 

Clerical aptitudes, 257 

Clothing, see home economics 

College, see higher education 

College teaching, 19, 25; see also par- 
ticular subject field 

Commercial subjects, bibliography, 77; 
equipment, 386; prognosis, 17; teach- 
ing of, 15 

Community use of school plant, court 
decisions, 457 

Composition, see English 

Conservation, 69 

Consolidation, 127, 432; see alse school 
districts 

Consumer education, 44, 69 

Contemporary problems, 14 

Cooperative testing, see regional testing 

Correlation, of abilities, 225, 235, 244; 
statistics, 308; see also causation 

Cost, bibliography, 201; of buildings, 408 ; 
of education, 148; see also finance, unit 
costs 

County units, 128, 131, 432 

Court decisions, on finance, 451; on in- 
surance, 455; on purchasing, 451; on 
school buildings, 451, 488 


Criticisms, of measurement, 545; of re- 
search, 497 

Current events, 72 

Curriculum, see particular subject field 


Debt service, 113; see also capital outlay, 
finance 

Delinquency, 254, 303; see also moral 
behavior 

Development, see genetic studies 

Diagnosis, 513; in arithmetic, 505 

Difficulty, see reading, vocabulary 

Documentary frequency studies, 69 

Drawing, 501 


Economies, 167, 373, 395; through re- 
organizing districts, 127; see also con- 
solidation, finance 

Educational Policies Commission, 128 

Educational tests and their uses, 493-596 

Emotions, and accidents, 266; and driv- 
ing, 263; development of, 237; varia- 
bility, 293 

Engineers, 369, 379 

English, achievement in, 506; bibliog- 
raphy, 81; composition, 25; errors, 26, 
28; freshman course, 25; grammar, 18, 
26; high-school curriculum, 25; meas- 
urement, 27; objectives, 26; prognosis, 
25; teaching, 25; usage, 27; see also 
literature 

Environment, and intelligence, 235, 241; 
and personality, 238; and academic suc- 
cess, 248 

Equalization, see local support, federal 
support, state support 

Equipment, 380; bibliography, 471; see 
also lighting, seating 

Errors, see particular subject field 

Essay tests, see tests and scales 

Eugenics, 243 

Evaluation, of budgets, 134; of fiction, 
30; of periodicals, 31; of research, 34, 
187; see also measurement 

Evaluation Staff (of the Progressive Edu- 
cation Association), 514, 534, 544 

Excursions, school, 70 

Experimentation, measurement in, 497 

Experiments, 310; on infants, 229 

Extermination of pests, 375 


Factor analysis, 231, 239, 257, 286, 297, 
307, 312; criticisms of, 545 
Family relations, 44, 293 
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Federal support of education, 108, 124, 
171; for parochial and private schools, 
180; see also finance 

Finance, evaluation of research in, 187; 
needed research, 140, 187; protection 
of funds, 158, 186; reports, 154, 202; 
see also accounting, bonds, budgeting, 
business administration, capital outlay, 
costs of education, debt service, econ- 
omies, insurance, junior colleges, legisla- 
tion, salaries, support of education, tax- 
ation, transportation of pupils, tuition 
fees, unit costs 

Finance and business administration, 103- 
212 

Fire, see insurance, safety 

Floors, cleaning, 371; maintenance, 372; 
material, 380 

Foods, see home economics 

Foreign education, buildings, 413, 481 

Foreign language, achievement in, 507; 
bibliography, 37, 87; in homes, 499; 
prognosis, 36; research in progress, 37; 
teaching of, 34; see also nationality 


Genetic studies, 549; see also emotions, 
development of; growth; language, de- 
velopment of; mental development; per- 
sonality, development of; physical de- 
velopment; play; preschool children; 
social adjustment and behavior 

Geography, see maps, social studies 

Geometry, achievement in, 504; see also 
mathematics 

Grammar, see English 

Graphology, 289 

Graphs, see nomographs, profiles 

Grounds, see playgrounds, sites 

Growth, curves, 235, 506; see also genetic 
studies for cross references 

Guidance, 542 


Handedness, 234, 265; see also laterality 

Handwriting, 508; see also graphology 

Health, of teachers, 378 

Health and physical education, 39; bibli- 
ography, 88; equipment, 386, 424; 
measurement, 40; needed research, 41; 
objectives, 40 

Heating, 373, 392; bibliography, 476; 
equipment costs, 408; see also ventila- 
tion 

Higher education, buildings, 423, 483; 
capital outlay, 458; court decisions, 
458; measurement, 497, 536; taxation, 
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458; see also accounting, academic 
success 

History, see social studies 

Hollerith tabulating machines, 540 

Home, see environment 

Home economics, bibliography, 90; meas- 
urement, 45; teaching, 42 

Homogeneous grouping, see ability group- 
ing 

Household arts, equipment, 385 


Illumination, see lighting 

Incidental teaching, in social studies, 72 

Index numbers, 410 

Indians, see nationality 

Individual differences, see adapting in- 
struction to pupils; variability between 
individuals 

Individualized instruction, see adapting 
instruction to pupils 

Industrial arts, bibliography, 92; cur- 
riculum, 48; equipment, 385; genera! 
shop, 48; history of, 47, 50; objectives, 
47; psychology, 47; teaching, 47; visual 
aids, 49 

Infants, see preschool children 

Insurance, 167; bibliography, 205; court 
decisions, 455; fire, 375; see also 
finance 

Integrated program, 27 

Intelligence, and achievement, 499; and 
clerical ability, 257; and delinquency, 
254; and driving, 264; and health, 244: 
and mechanical ability, 259; and moral 
behavior, 245; and music, 245; and 
occupations, 242, 251; and personality, 
245; and physical defects, 253; and 
school success, 247; and sex differences, 
244; surveys of, 246; testing programs, 
246; see also accomplishment quotient, 
environment, racial differences 

Intelligence tests, applications of, 241; 
bibliography, 318, 327; for infants and 
preschool children, 229; group, 223; 
incentives, 225; individual, 221; non- 
verbal, 227 

Internal accounting, see accounting 

Interpretation, of test scores, 540 

Interests, 275; of pupils, 43, 45, 64; sur- 
veys, 298 

International relations, 69 

Interviews, 279; as tests, 222, 265; for 
diagnosis, 505; in research, 45; in 
teaching, 61, 71 

Inventories, 391; see also accounting, 
property 
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Item analysis, 307, 311 


Janitors and custodians, 369 

Judgments, 281; see also rating 

Junior colleges, support, 129; see also 
finance, higher education 


Kindergarten, equipment, 381 
Kitchen, see lunchrooms 


Laboratories, see physical science 

Landscaping, see sites 

Language, development of, 236, 254 

Laterality, 504 

Laundry, 376 

Left-handedness, see laterality 

Legislation, budgets, 133; school build- 
ings, 123; taxes, 110; see also court 
decisions, finance 

Libraries, equipment, 381; junior col- 
leges, 423; schools, 31; see also reading 

Lighting, 374, 376, 380, 382, 399, 415, 
425; bibliography, 477; equipment 
costs, 408; needed research, 401, 407; 
see also painting of buildings, windows 

Literature, measurement, 528; teaching 
of, 29 

Local School Units Project, 437 

Local support of education, 184 

Lunchrooms, 144; court decisions, 458; 
equipment, 382 


Maintenance, of grounds, 378; of school 
plant, 369; of walls, 377; see also floors, 
operation of school plant, painting of 
buildings 

Manual training, see industrial arts 

Maps, 387; reading of, 72 

Marital relations, 292 

Marks, 499 

Mathematics, achievement in, 54; bibli- 
ography, 95; curriculum, 56; drill, 57; 
prognosis, 51; psychology of, 51; and 
reading, 55; remedial instruction, 57; 
sex differences, 55; teaching of, 51; 
vocabulary, 55; see also algebra, arith- 
metic, geometry 

Measurement, bibliography, 566; criti- 
cisms of, 545; frequency of, 21; his- 
tory, 547; in higher education, 497; in 
research, 497; incentives, 225; influ- 
ence on education, 529; needed re- 
search, 316; of appreciation, 29; of 
attitudes, 45, 63; of financial ability, 
effort and need, 174, 182; of intangible 
outcomes, 544; of visual defects, 33; 


philosophy of, 531; purposes, 528; 
trends, 546; unit of, 308; see also 
evaluation, factor analysis, objectivity, 
rating, regional testing, reliability, 
scaling, tests and scales, validity, par- 
ticular subject field 

Mechanical aptitudes, 258 

Memory, 226, 236 

Mental development, see _ intelligence, 
genetic studies for cross references 

Mental tests, see psychological tests 

Methods of teaching, see particular sub- 
ject field 

Mobility of population, 173 

Moral behavior, 11, 245, 302; see also 
cheating 

Motion pictures, 509; in schools, 384; see 
also observation, photographic record- 
ing, visual aids 

Motivation, 11 

Motor abilities, 226, 233, 244, 259; see 
also physical development 

Music, achievement in, 499, 510; and in- 
telligence, 245, 499; aptitude, 262; 
bibliography, 58, 96; equipment, 386; 
history, 58; in American history, 70; 
measurement of, 231; nationality, 510; 
predicting, 510; prognosis, 58; psy- 
chology of, 237; surveys, 58; teaching 
of, 58 


Nationality, and art, 8; and language, 
499; and music, 510; and reading, 503; 
see also racial differences 

Nature and nurture, see environment 

Needed research, 187; art, 10; finance, 
140; health and physical education, 41; 
intelligence testing, 256; lighting, 401; 
measurement, 316; on school costs, 153; 
on school plant, 460; on transporta- 
tion, 431; personality study, 217; psy- 
chological tests, 217; reporting intel- 
ligence scores, 250 

Negroes, support of education, 174, 180; 
see also racial differences 

Nomographs, 307 

Norms, 542: use of, 525 

Nursery school, equipment, 381 


Objective tests, see tests and scales 

Objectives, broadened, 533 

Observation, 231, 237, 239, 549; by mo- 
tion pictures, 15; by teachers, 515; of 
behavior, 289; of pupils, 534; varia- 
tion in, 62; see also photographic 
recording 


591 








Review or EpucaTIONAL RESEARCH 





Vol. VIII, No. 5 





Occupations, and fecundity, 243; and in- 
telligence, 242, 251; see also environ- 
ment, vocational aptitude 

Office, equipment, 381; planning, 381 

One-teacher schools, 128, 151; see also 
size of schools 

Operation of school plant, 369; bibli- 
ography, 466; see also engineers, floors, 
heating, janitor, laundry, lighting, 
swimming pools, ventilation 

Organization, see school organization 


Painting of buildings, 376, 405; see also 
lighting 

Parent-child relationships, 277 

Parochial schools, support, 180 

Personality, and delinquency, 255; and 
intelligence, 245; development of, 42, 
237; measurement of, 269, 340; needed 
research, 217; rating, 281; studies of, 
229; surveys, 292; see also adjustment, 
psychological tests and cross references, 
rating 

Phonics, 501 

Photographic recording, 233, 237, 240 

Physical defects, 244 

Physical development, 244, 315; see also 
motor abilities, genetic studies for 
cross references 

Physical science, equipment, 385, 423 

Pictures, 237; see photographic recording 

Planning, 120, 145; bibliography, 194; 
see also budgeting 

Play, 231, 239 

Playgrounds, 386; court decisions, 451; 
see also sites , 

Plumbing, 397; costs, 408; see also sani- 
tation, toilets 

Poetry, 507, 511 

Population, mobility of, 173; prediction, 
421 

Prediction of academic success, see aca- 
demic success 

Preschool children, 229; tests, 320 

Prevention, of achievement deficiencies, 
513 

Professional aptitudes, 261 

Profiles, 543; occupational, 266 

Protection of funds, see finance 

Psychological tests and scales, 308; see 
also aptitudes, attitudes, character, in- 
tlligence, interests, moral behavior, 
needed research, personality, social ad- 
justment and behavior, surveys, voca- 
tional aptitude 


592 


Psychological tests and their uses, 213. 
364 

Psychology, educational, bibliography, 80: 
measurement, 21; prognosis, 20; teach- 
ing of, 20; for psychology of schoo! 
subjects, see particular subject field 

Psychology and methods in the high 
school and college, 1-102 

Public relations, 123, 131, 154; see also 
reports 

Purchasing, 168, 387; see also business 
administration 


Racial differences, 230, 234, 237, 243, 295 

Racial prejudices, 279, 300 

Radio, education, 36, 44; equipment, 383 

Rating, of personality, 281, 307; scales, 
231 

Reading, achievement, 501; and _intelli- 
gence, 502; and nationality, 503; and 
visual efficiency, 504; diagnosis, 503, 
514; difficulty of material, 32; effect 
of typography, 503; errors, 236; evalu- 
ation of tastes, 30; extensive, 29; in 
college, 503; interests, 30, 276; of 
periodicals and newspapers, 31; rate, 
33; readiness, 247, 501; remedial, 32, 
502, 514; speed of, 20; see also phonics 

Reasoning, 236 

Recording equipment for speech study, 29 

Records, 291; pupil, 246, 534; test, 249; 
see also anecdotal records, photographic 
recording 

Regional testing, 543 

Reliability, 309; of essay tests, 521 

Religious attitudes, 279 

Remedial instruction, 513; in mathe- 
matics, 57; in reading, 32 

Repairs, see maintenance 

Reports, 154; see also finance, public 
relations 

Research, evaluation of, 34; in govern- 
ment, 462; in industry, 461; on school 
buildings, 460, 490; see also experi- 
mentation, measurement, needed re- 
search 

Rural education, 182 


Safety, 49, 263, 375; see also accidents, 
automobile driving 

Salaries, custodians, 370; government, 
164; schedules, 163, 204; teachers, 163; 
see also finance 

Sampling, 239, 310; see also statistical 
methods 
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Sanitation, 397; bibliography, 476; see 
also plumbing, toilets 

Seales, see scaling, tests and scales, par- 
ticular subject field 

Scaling, 307; see also measurement, unit 
of; scoring 

Scholastic aptitude, see academic success 

School buildings, adaptation to educa- 
tional programs, 445; architecture, 443; 
bibliography, 466, 476, 480, 481, 483, 
486, 487, 488, 490; costs, 421; court 
decisions, 451; effect of size of units, 
432; materials, 447; needed research, 
460; programs, 419; surveys, 418, 432; 
trends, 443; see also capital outlay, 
equipment, extermination of pests, 
floors, heating, insurance, lighting, 
maintenance, operation, sites, ventila- 
tion 

School districts, 127, 432; reorganization, 
122, 126, 151, 181, 432; see also con- 
solidation; administrative units 

School finance, evaluation of research, 
187; needed research, 187 

School law, 451; see also court decisions 

School organization, 126; bibliography, 
196; see also administrative units 

School plant and equipment, 365-492 

School surveys, 146; see also school build- 
ings, surveys 

Science, bibliography, 97; errors, 60; ob- 
jectives of, 63, 65; teaching of, 60; 
see also physical science 

Scientific attitudes, 63 

Score-card, 379 

Scoring, 307, 311, 537; bibliography, 564; 
machine, 538; of essay tests, 521 

Seating, 380 

Sex differences, in art, 8; in intelligence, 
244; in mathematics, 55; in music, 262; 
in reading, 503 

Shop, see industrial arts 

Shorthand, teaching of, 17 

Sight saving classes, see lighting, special 
schools and classes 

Sites, court decisions, 452; landscaping, 
425; maintenance, 378, 425; surfacing, 
378; see also playgrounds 

Size of schools, 127, 151; see also one- 
teacher schools 

Social adjustment and behavior, 43, 238; 
measurement of, 231 

Social attitudes, 67 

Social studies, achievement in, 67, 509: 
bibliography, 98; contemporary prob- 


lems, 67; curriculum, 69; measurement, 
68; prediction, 510; teachers, 72; 
teaching of, 70; vocabulary, 71; see 
also current events, consumer education 

Sound recording, 498, 507; of tests, 498 

Special schools and classes, 401, 416 

Speech, defects, 255; recording equip- 
ment, 29 

Spelling, 36; measurement of, 507 

State school systems, 128; see also state 
units 

State studies of administrative units, 432 

State support of education, 108, 127, 130, 
180; see also finance, state units, sup- 
port of education 

State units, 131; see also state school 
systems 

Statistical methods, 307; tests of sig- 
nificance, 310; see also correlation, fac- 
tor analysis, nomographs, sampling, 
tabulating machines, tests and scales, 
weighting 

Stenographic reports, 71 

Study, methods, 20, 22, 32, 60, 71 

Summer workshops, 536 

Superstitions, 63 

Supplies, 387; purchase and storage, 168 

Support of education, 171; bibliography, 
190, 206; see also federal support of 
education, local support of education, 
state support of education, taxation 

Surveys, intelligence, 246; school build- 
ings, 418, 432, 483 

Swimming pools, 379 


Tabulating machines, 307, 309 

Taxation, 107, 185; of higher education 
institutions, 458; of school property, 
455; see also finance, support of edu- 
cation 

Teacher training, industrial arts, 49; so- 
cial studies, 72 

Teachers’ salaries, see salaries 

Teachers, traits, 304 

Teaching, methods of, see particular sub- 
ject field, teaching 

Testing, frequency of, 71 

Tests, see tests and scales 

Tests and scales, by sound, 498; con- 
struction of, 311, 357; effect of kind 
on study, 22; essay tests, 22, 28, 517, 
559; length of, 21; objective tests, 498, 
518; studies of, 497; see also item 
analysis, measurement, norms, profiles, 
psychological tests, reliability, scoring, 
validity, particular subject field 
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Textbooks, 169; analysis, 49, 69, 71 

Toilets, 397 

Transfer of training, 8, 20, 64 

Transient youth, 499 

Transportation of pupils, 126, 150, 426; 
bibliography, 485; costs, 427, 431; 
needed research, 431; see also business 
administration, finance, unit costs 

Tuition fees, 129 

Typewriting, psychology of, 15; teaching 
of, 15 


Unit, see measurement 

Unit costs, 148, 158; buildings, 409, 421; 
see also costs of education, finance, 
transportation of pupils 

Units, bibliography, 486; see also admin- 
istrative units, attendance units, con- 
solidation, county units, school districts, 
school organization, size of schools, state 
units, unit costs 


Validity, 311; need for, 497 

Variability, between abilities, 241; be- 
tween individuals, 241, 258, 525; be- 
tween pupils, 8; in occupational groups, 
267; in performance, 264; in personal- 
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ity measurements, 271; see also ability 
grouping, adapting instruction to pupils, 
reliability, sex differences 

Ventilation, 382, 392, 414; bibliography, 
476; see also heating 

Visual aids, 384, 425, 510; in industria] 
arts, 49; in typing, 15; see also maps 

Visual defects, 33; and lighting, 399 

Visual efficiency and reading, 504 

Vital statistics, see birth rates, eugenics 

Vocabulary, 27, 502; algebra, 55; art, 9; 
burden, 32; and intelligence, 248, 253; 
of preschool children, 231; social studies 
texts, 71; test, 231 

Vocational aptitude, 257, 304; tests, 304; 
see also academic success, professional 
aptitudes 

Vocational education, support of, 179; 
see also commercial subjects; industrial 
arts 

Vocational guidance, 304 


Weighting, 309, 312 
Windows, 380, 403; see also lighting 


Youth, see transient youth 
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